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aspect an access control list CAM (ACLCAM) contains 
masked flow information. The ACLCAM provides an index 
to internal token bucket counters and p reconfigured contract 
values of an aggregate flow table which becomes affected by 
the packet statistics. In this way flows are aggregated for 
assignment of output queues and thresholds, possible drop- 
ping and possible modification of packets. In another aspect 
the CAM contains active flow information, the ACLCAM 
and the aggregate flow table are combined in one system and 
used to produce in parallel a pair of traffic rate limiting and 
prioritizing decisions for each packet. The two results are 
then resolved to yield a single result. 

24 Claims, 7 Drawing Sheets 



■ Ultras 



tomu 

TUHt 
LOOKUP 



MBURAIE 

uumnotg 



COKHUCT 

t\ 



01/16/2004, EAST Version: 1.4.1 



US 6,643,260 Bl 

Page 2 



U.S. PATENT DOCUMENTS 



5,313,579 A * 5/1994 Chao 370/230.1 

5,317,562 A 5/1994 Nardin el al 370/428 

5,359,592 A 10/1994 Corbalis et al 370/233 

5,408,472 A 4/1995 Hluchyj ct al 370/416 

5,467,349 A • 11/1995 Huey et al 370/397 

5,485,455 A 1/1996 Dobbins et al 370/255 

5,497,371 A 3/1996 Ellis ct al 370/394 

5,502,725 A 3/1996 Pohjakallio 370/337 

5,515,363 A • 5/1996 Ben-Nun et al 370/232 

5.570.360 A 10/1996 Klausmcicr ct al 370/232 

5.570.361 A 10/1996 Norizuki et al 370/392 

5,598,581 A 1/1997 Dailies et al 370/401 

5,610,910 A 3/1997 Focsaneanu et al 370/351 

5,666,361 A * 9/1997 Aznar et al 370/392 

5,694,554 A * 12/1997 Kawabata et al 370/412 

5,699,521 A 12/1997 Iizuka et al 370/455 

5,712,854 A 1/1998 Dieudonne et al 370/536 

5,734,654 A 3/1998 Shirai et al 370/396 

5,737,635 A 4/1998 Daines et al 709/232 

5,765,032 A 6/1998 Valizadch 370/235 

5,778,182 A 7/1998 Cathey et al 709/219 

5,793,978 A 8/1998 Fowler 709/201 



5,805,595 A 9/1998 Sharper et al 370/442 

5,828,653 A * 10/1998 Go&s 370/230 

5,835,494 A 11/1998 Hughes et al 370/232 

5,835,727 A 11/1998 Wong et al 709/238 

5,838,683 A 11/1998 Corley et at 370/408 

5,922,051 A 7/1999 Sidey 709/223 

5,926,458 A 7/1999 Yin 370/230 

5,949,784 A * 9/1999 Sodder 370/397 

5,959,990 A 9/1999 Frantz et al 370/392 

5,970,477 A 10/1999 Roden 379/112.01 

6,031,820 A * 2/2000 Kawasaki et al 370/230 

6,035,281 A 3/2000 Crosskey et al 705/14 

6,047,326 A * 4/2000 Kilkki 370/232 

6,091,708 A * 7/2000 Matsunuma 370/233 

6,119,160 A 9/2000 Zhang et al 709/224 

6,122,252 A * 9/2000 Aiuioto et al 370/235 

6,157,613 A * 12/2000 Watanabe et al 370/229 

6,167,445 A * 12/2000 Gai et al 709/220 

6,275,494 Bl * 8/2001 Endo et al 370/395.52 

6,335,932 B2 • 1/2002 Kadambi el al 370/391 

2002/0085496 Al * 7/2002 Jamp et al 370/235 



* cited by examiner 



01/16/2004, EAST Version; 1.4.1 



U.S. Patent Nov. 4, 2003 Sheet 1 of 7 US 6,643,260 Bl 



10 

S 


12 

\ 


18 

s 


14 

/ 


8 

/ 


16 

/ 


MAC 
DESTINATION 
ADDRESS 


MAC 
SOURCE 
ADDRESS 


OPTIONAL 
FLOW 
INFORMATION 


DATA 


CRC 



FIG. 1A 



ETHERNET FRAME 



DA 

/ 


SA 

12—'.-""* 


ET/LEN 


DATA 


CRC 


10 

802.1Q's 

TAG 
(4 bytes) 


■ i : : : ETHER TYPE : i • i ■ 


PRI 


I TYPE | : 


| ; ! Ivlanid; j j I 


20 
"\ 2 2 



FIG. 1B 



01/16/2004, EAST Version: 1.4.1 



U.S. Patent Nov, 4, 2003 Sheet 2 of 7 US 6,643,260 Bl 



ByteO 



4-bit 
version 



4-bit 
hd len 



Bytel 



TOS value 



26 



Byte 2 



Byte 3 



13_len (in bytes) 



16- bit identification 



3-bit 
flags 



13-bit fragmentation 
offset 



TTL 



prot_typ 



28 



csum 



Source IP address 



30 



Dest IP address 



Options (if any) 



Source port number 34 



Dest port number 



32 bit sequence number 



32 



36 



TCP/UDP only 



32 bit acknowledge number 



4-bit 
hll 



Reserved 


u 


a 


P 


r 


s 


(6 bits) 


r 
9 


c 
k 


s 
n 


s 
t 


y 

n 



16 bit TCP checksum 



16 bit window size 



TCP only 



16 bit urgent pointer 



FIG. 1C 



01/16/2004, EAST Version: 1.4.1 



U.S. Patent 



Nov. 4, 2003 



Sheet 3 of 7 



US 6,643,260 Bl 



PACKET ARRIVES 



INPUT QUEUE SCHEDULING 



INPUT CLASSIFICATION: 
DETERMINE CODEPOINT 
OF PACKET 



INPUT RATE LIMITATION 



ALTER 



FORWARDING DECISION 



-60 



-61 



-63 



64 



66 



62 



DROP PACKET IF 
INPUT THRESHOLD 
BASED ON CODEPOINT 
IS EXCEEDED 



65 

jL 



DROP PACKET OR 
CHOOSE ALTERNATE 

CODEPOINT FOR 
PACKET BASED ON 
STORED POLICY 



PACKET REWRITE: REWRITE 
CoSAbS OF PACKET WHERE 
APPROPRIATE AND DESIRABLE 



-68 



OUTPUT QUEUE 
SCHEDULING: DIRECT 
PACKET TO APPROPRIATE 
OUTPUT PORT QUEUE 



-70 



SEND PACKET TO NEXT NODE 



-72 



FIG. 2 




DROP 



RANDOM 
DROP 



NO 
DROP 



FIG. 3 



01/16/2004, EAST Version: 1.4.1 



U.S. Patent Nov. 4, 2003 Sheet 4 of 7 US 6,643,260 Bl 



L3 CAM 
76- 



78 



L3 
TABLE 



HASH 
INDEX 



74- 



82^ 



89 



HASH 
BLOCK 



TOKEN BUCKET 



100 



-94 



92 

jL 



91 

jL 



-90 



96 



CONTRACT 
VALUE 
+ ToS 



98 



L3 TABLE 

RATE 
LIMITATION 



80 



TCAM L3 
TABLE 
LOOKUP 



88- 



TCAM 
AGGREGATE 
TABLE 
LOOKUP 



101* 



TCAM 



AGGREGATE 
TABLE RATE 
LIMITATION 





L3TOS 


'8 


L3DROP 


L3 ALTER 


-A- 


AGTOS 


8 


AGDROP 


AGALTER 



TOKEN 
BUCKET 



rv 

102 



INDEX 



CONTRACT 
VALUE 

+ ToS , PRIORITY 

~*\ 



AGGREGATE 
TABLE 



86 



-84 



FIG. 4A 



104 



106 



L3TOS 


COMPARE 
MASK 




AGTOS 









> 



TOSL3AG 



FIG. 4B 



01/16/2004, EAST Version: 1.4.1 



U.S. Patent 



Nov. 4, 2003 



Sheet 5 of 7 



US 6,643,260 Bl 



TOSL3AG 




DO NOT CHANGE 



120- 



118- 



MASK — i 



L3TOS 


V] 




> 

8 


L3TOS 




8 



112 

A. 



8 

ORIGTOS 



8:3 
CoS 
MAP 



122 



-126 



130 



ORIGCOS y 
-r- 



CoSf 



3 \ 124 

8 DO NOT 128 

CHANGE 12g 



L 



118 



8 



ORIGTOS , 
116 134 y 



PRIORITY 



138 




36 



•132 



ToS F 



144 

2— 



DROP F 



142 



DO NOT CHANGE 



FIG. 4C 



START 



-200 



PACKET ARRIVES 



EXTRACT ToS OR 
EQUIVALENT FIELD IF PRESENT 



212 



GOTO A 



ALT: GO TO B 
TO BYPASS 
PROCESS 



YES 



-208 




-202 



■204 



216 



L3 TABLE BRANCH 



PARSE PACKET FOR MICRO FLOW 
I 



AGGREGATE 
TABLE BRANCH 



222- 
224- 



DERIVE OPQSEL 
AND OPQTH 

GO TO A 



218 



220 



DERIVE OPQSEL 
AND OPQTH 

GOTO A 



-226 
-228 



FIG, 5A 



01/16/2004, EAST Version: 1.4.1 



U.S. Patent Nov. 4, 2003 



Sheet 6 of 7 US 6,643,260 Bl 



214 




BYTECOUNT 2 = BYTECOUNT 1 + MAX(MINBYTES ( PKTBYTES) 
BYTECOUNT 3 = BYTECOUNT 1 



-232 



HAS 
"TIMESTAMP^ 
INTERVAL BEEN 
XROSSEDr, 



-238 



BYTECOUNT 2 = BYTECOUNT 2 - (LEAKRATE * INTERVALS) 



240 



FIG. 5B 



242 



-242 



-244 

"BYTECOUNT NO 
.CONTRACT VALUE^. 
? 



COMMIT TO TABLE: 
BYTECOUNT 1 = 
BYTECOUNT 2 



-248 



246- 



COMMIT TO TABLE: 
BYTECOUNT 1 = BYTECOUNT 3 
- (LEAKRATE * INTERVALS) 

I 



ToS -* MAPPEDjToS] 
(CoS IS A MAP OF ToS AND 
CONTAINS OPQSEL AND OPQTH) 



GO TOC 



-250 



-252 



GO TOC 



254 



FIG. 5C 



01/16/2004, EAST Version: 1.4.1 



U.S. Patent Nov. 4, 2003 Sheet 7 of 7 



US 6,643,260 Bl 



210- 



-236 



RESOLVE TWO ToS 
(L3TOS AND AGTOS) 







B 






256 



YES 



SELECT NO PORT 



PACKET SENT TO 
SELECTED OUTPUT PORT 



CALCULATED ToS SENT TO PORT 

I 



PORT SELECTS QUEUE 
BASED ON CODEPOINT 



260 



-262 



264 



-266 



PORT OPTIONALLY IMPLEMENTS 
WRED ON SELECTED QUEUE 



270 




YES 



268 



REWRITE PACKET'S 
DS/ToS FIELD 



-274 




YES 



END 



REWRITE PACKET'S 
802.1 Q/ISL CoS FIELD 



-278 



-272 



-276 



FIG. 5D 



01/16/2004, EAST Version: 1.4.1 



US 6,6< 

1 

METHOD AND APPARATUS FOR 
IMPLEMENTING A QUALITY OF SERVICE 
POLICY IN A DATA COMMUNICATIONS 
NETWORK 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention relates to the field of data communications 
networks. More particularly, this invention relates to a 
method and apparatus for implementing a quality of service 
(QoS) policy in a data communications network so as to 
thereby prioritize network traffic into a plurality of service 
levels and provide preferential treatment of different classes 
of data traffic on the data communications network. A 
number of priority levels may be implemented in accordance 
with the invention. 

2. Background 

This invention relates to switched packet data communi- 
cations networks. There are a number of different packet 
types which are used in modern switched packet data 
communications networks. 

FIG. 1A depicts a generic packet 8 using Layer 2 encap- 
sulation. A number of different Layer 2 encapsulation pro- 
tocols are recognized. Each may include a MAC (media 
access control) destination address 10 and a MAC source 
address 12. The data 14 may include Layer 3 encapsulated 
packet information. A CRC (cyclic redundancy check) 16 
may also be provided at the end of the Layer 2 encapsula- 
tion. The unlabeled block 18 may include an Ethernet type 
for Ethernet V 2.0 (ARPA) packets. The Ethernet type may 
include IPv4 (IP), IPX, AppleTalk, DEC Net, Vines IP/Vines 
Echo, XNS, ARP or RARP. Other known encapsulations 
include SAP, SAP1, SNAP and the like. The meaning of the 
bits in and the size of block 18 differs among the different 
encapsulation protocols. This information is sometimes 
referred to as the Layer 2 Flow Information. 

One special case of Layer 2 encapsulation is the IEEE 
802. lq frame shown schematically in FIG. IB. The IEEE 
80Zlq frame (or packet) 20 has a MAC Destination Address 
("DA") 10, MAC Source Address ("SA") 12, Data Portion 
14 and CRC 16. In addition, within block 18 is the IEEE 
802. lq "tag" 22 which includes, among other items, a block 
of three priority bits 24. These three bits are also known as 
a "Class of Service" or "CoS" field. 

FIG. 1C depicts the Layer 3 and Layer 4 structure of a 
typical IP packet The IP packet format will be detailed here 
by way of example because it is presently one of the most 
common Layer 3 packet types. The fields of importance to 
this disclosure are the "ToS value" or type of service 26 
which is a preferably 8-bit field also known as the Differ- 
entiated Service ("DS") field, "prot-typ" or IP protocol type 
28 (typically either TCP (transmission control protocol) or 
UDP (user datagram protocol)), the Source IP address 30 
(usually the IP address of the originating station), the Des- 
tination IP address 32 (usually the IP address of the ultimate 
destination station), the Layer 4 source port number 34 
(available for TCP and UDP packets only) and the Layer 4 
destination port number 36 (available for TCP and UDP 
packets only). The Layer 3 flow information includes the 
information before the source port number 34. The Layer 4 
flow information includes the Source and Destination ports 
34, 36. The Layer 4 flow information may be used to identify 
a particular packet flow as being the product of (source port) 
or directed to (destination port) a particular application. The) 
ToS and CoS fields are used by routers of the data commu- 
nications network to provide priority/delay/dropping ser- 
vices. 
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As the use of data communications networks increases 
worldwide, congestion of those networks has become a 
problem. A given data communications network, a given 
node on a data communications network, or a given link 

5 connecting two nodes has a certain capacity to pass data 
packets and that capacity cannot be exceeded. When data 
traffic on the data communications network becomes heavy 
enough that one can anticipate congestion problems, it is 
desirable to implement a "Quality of Service" or QoS policy 

10 so as to give priority to certain types of traflSc and restrict the 
flow of other types of traffic, thus assuring that critical 
communications are able to pass through the data commu- 
nications network, albeit at the expense of less critical 
communications. 

15 One of the problems that network devices face in imple- 
menting quality of service solutions is in identifying and 
grouping transmissions to be given preferential treatment or 
to be restricted, that is, to prioritize the traffic in accordance 
with the Quality of Service policy established for the net- 

20 work. This becomes especially critical as bandwidth 
increases substantially over certain links while other links 
remain relatively slow resulting in traffic speed mismatches 
which, in turn, cause bottlenecks to data traffic over the 
relatively slow links. Such groupings must be consistently 

25 applied to traffic and must be applied at the rale that the 
traffic is passing without introducing additional delays or 
bottlenecks. Such groupings may be, for example, by pro- 
tocol type, by destination IP address, by source IP address, 
by destination/source IP address pair, by source port and/or 

30 destination port (Layer 4), and the like. 

Routers have, in the past, kept packet counts and rate 
limited packets in software, but router software has not 
scaled to the level of being able to process millions of 
packets per second through a node, providing the basic 

35 routing functions that they are required to provide and being 
able to also provide the rate limitation function. 

One approach to identifying and grouping transmissions- 
is for the host to categorize packets by use of the L2 CoS 

^ field, L3 ToS field or both. The primary disadvantage of this 
approach is that it removes control from the system admin- 
istrator and requires one to trust the end stations to the 
communication to properly implement the QoS policy. In 
some cases this trust cannot be justified. In addition, an end 

4J station only sees its own packets and therefore is unaware of 
the overall resource requirements within the data commu- 
nications network and cannot make allowances for these 
requirements. 

Accordingly, a Quality of Service policy controlled by a 
50 network system administrator is needed together with a 
mechanism for applying it at the full data rate of the data 
communications network. 

SUMMARY OF THE INVENTION 

55 In a first aspect of the invention a content addressable 
memory (CAM or L3 Table) contains flow information for 
each active flow of packets passing through a given node of 
a data communications network. The CAM has associated 
with each entry (corresponding to each active flow) a packet 

so counter, a number of bytes seen counter, a token bucket and 
a contract value or committed access rate. Each flow is 
assigned one of a plurality of output queues and optionally 
at least one output queue threshold value. A token bucket 
algorithm is employed on each flow to determine whether 

65 packets from that flow exceed a committed access rate. Such 
packets may be dropped or optionally modified to reflect an 
alternate output queue and/or alternate output queue thresh- 
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old value before being sent to the selected output queue for 
transmission from the node. 

In a second aspect of the invention an access control list 
CAM (ACLCAM) contains masked flow information such 
as, for example, all or portions of IP source and/or destina- 
tion addresses, protocol types and the like. The ACLCAM 
provides single clock cycle accesses when performing look- 
ups for each packet. The ACLCAM provides an N-bit index 
value in response to QoS lookups based upon the best match 
for the current packet. 

The best match is order dependent for the entry in the 
ACLCAM and may represent any fields in the packet upon 
which the administrator of the data communications network 
wishes to base traffic rate limiting and prioritizing decisions. 
A plurality of ACLCAM entries can yield the same N-bit 
index value. The N-bit ACLCAM index selects one of 2^ 
internal counters and associated preconfigured contract 
values, which become affected by the packet statistics. A 
token bucket algorithm is employed on these couplers as 
discussed above. 

The ACL CAM may also be used to determine the QoS 
parameters for new entries in the L3 Table as they are 
created. In addition, it is used to select an entry in the 
aggregate flow table described below. 

In a third aspect of the invention, an aggregate flow table 
contains information specifying plural flows — for example 
all traffic between x and y regardless of type, all traffic to x 
of a certain type, all traffic from anyone of a certain type, and 
the like. These specifications may specify more than one 
flow. This is possible because each entry has a correspond- 
ing flow mask. This is different from the L3 Table which 
may identify certain specific flows only, i.e., all traffic of 
protocol type HTTP from x to y. Since the entire L3 Table 
operates with a single flow mask, each entry will have 
identical specificity, thus, there could be multiple entries for 
traffic from x to y if such traffic includes multiple protocol 
types and the flow mask does not mask the protocol type, for 
example. 

In a fourth aspect of the invention, the CAM, an aggregate 
flow table and the ACLCAM are combined in one system 
and used to produce, in parallel, a pair of traffic rate limiting 
and prioritizing decisions for each packet. The two results 
are then resolved (if in conflict) to yield a single result which 
is acted upon. The result is to modify or not modify the 
packet's CoS and/or ToS (or other) fields and to drop or pass 
the packet onto the next node of the data communications 
network. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1A is a diagram showing the structure of a typical 
Ethernet packet 

FIG. IB is a diagram showing the structure of a typical 
Ethernet packet including the IEEE 802. Iq tag. 

FIG. 1C is a diagram showing the structure of a Layer 3 
IP packet 

FIG. 2 is a block diagram showing the implementation of 
a Quality of Service policy. 

FIG. 3 is a diagram showing the functional operation of 
an output queue implementing threshold-based dropping. 

FIGS. 4A, 4B and 4C are a system block diagram of an 
apparatus in accordance with a presently preferred cmbodi-' 
mcnt of the present invention. 

FIGS. 5A, SB, SC and 5D are a flow diagram of packet 
processing in accordance with a presently preferred embodi- 
ment of the present invention. 
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DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

Those of ordinary skill in the art will realize that the 

5 following description of the present invention is illustrative 
only and not intended to be in any way limiting. Other 
embodiments of the invention will readily suggest them- 
selves to such skilled persons from an examination of the 
within disclosure. 

10 In accordance with a presently preferred embodiment of 
the present invention, the components, process steps, and/or 
data structures are implemented using ASIC technology. 
This implementation is not intended to be limiting in any 
way. Different implementations may be used and may 

25 include various types of operating systems, computing 
platforms, and/or computer programs. In addition, those of 
ordinary skill in the art will readily recognize that devices of 
a more general purpose nature, such as hardwired devices, 
devices relying on FPGA technology, and the like, may also 

20 be used without departing from the scope and spirit of the 
inventive concepts disclosed herewith. 

Introduction 

Unless data communications networks are provisioned 

25 with large excess bandwidth, there may be times when the 
offered load at a given link will be greater than the capacity 
of that link. This results in greater than average packet delay 
or even dropping packets in excess of the link capacity. 
While this may be acceptable to many applications, it can 

30 effectively result in loss of service for others. Furthermore, 
user policies may dictate that some traffic is more important 
than other traffic and should, therefore, be given preferential 
treatment in these congestion situations. Even in uncon- 
gested conditions, it may be desirable to give preferential 

35 treatment to traffic with more stringent real time require- 
ments (such as voice and video) so as to avoid the delay 
incurred waiting for the transmission of long packets with 
less stringent real time requirements. 

^ Providing preferential forwarding treatment to some 
traffic, perhaps at the expense of other traffic, is referred to 
by the general term Quality of Service (QoS). QoS com- 
prises four distinct functions. These are (1) Classification; 
(2) Rate Limitation; (3) Packet Rewrite; and (4) Scheduling. 

45 Classification is the process by which a QoS label is 
assigned to a packet The QoS label is represented by a 
codepoint used internally by the device which determines 
how the packet is treated as it flows through the device. A 
codepoint is an integer or other representation representing 

50 the QoS classification that the device assigned the packet to. 
The codepoint also determines the value written into the 
packet's CoS (for 802.1q packets) and ToS (for IP packets) 
fields. 

CoS means Class of Service. This is the name given to 
55 three bits in the Layer 2 header (CoS 24 in FIG. IB) that 
indicate the QoS assigned to this packet These three bits are 
located in the 802. lq header for 802. Iq-encoded packets and 
in the user field of the ISL (Inter-Swilch Link) header for 
ISL-encapsulated packets. Those of skill in the art will 
60 realize that the present invention will work essentially 
interchangeably with 802.1q — tagged frames and ISL 
frames as well as any other scheme including QoS encoding. 

ToS means Type of Service. It is a preferably 1 byte (8-bit) 
field in the IP header (ToS 26 in FIG. 1C) that indicates the 
65 QoS assigned to an IP packet Since the ToS field is not 
available for all packet types, the CoS field is also used. ToS 
is in the process of being redefined as "Differentiated 
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Services" (DS). The ToS/DS field can select among up to ACEs and QoS parameters; (2) device input port (or subnet 

256 (2 s ) different queues, for example. or VLAN); (3) the codepoint the switch assigned to the 

Input to the classification function includes user policies P"** and ( 4 ) thc P ackc | flow Eluding layers 2, 3 and 4. 

that map ACEs to codepoints. ACE means access control The. output * a new codepoint, which may be either the 

entry. It is a filter that is used to identify flows with certain 5 onginal one or die new one, and a bootean va^ to ui^te 

. , „. ■ , . c u i « . . whether or not to drop the packet. If the packet is dropped 

characteristics. It includes fields such as device input and/or j ■ * - - i * 

. . . ■ . j/ , i \ ft a Kf c* t i the new codepoint is irrelevant, 

output ports, input and/or ou£ut VLANS, Layer 2 Racket tem ^ i& ^ g which me device wrfles 

addresses, Layer 3 addresses, TCP/UDP (Layer 4) port a m CqS 24 (fof ^ ^ ^ > q kcts) ^ 2fi 

numbers, and the like. (fof jp packe(s Qnly) mtQ ^ packel yalucs arc 

A committed access rate (CAR) is the bandwidth that a derived from the codepoint, preferably through a conven- 

specific flow or group of flows will be limited to. The CAR Uonal mapping function. Input to the rewrite function is the 

can be enforced by rate limitation (dropping out-of-profile packet's codepoint, the codepoint to CoS mapping and the 

packets under certain levels of congestion in accordance codepoint to ToS mapping. Other types of packets may 

with a selected algorithm) or by shaping. employ packet rewrite such as ISL encapsulated packets and 

The result of classification is a codepoint assigned ^ 

(internal to the device) to the packet. (Depending on the user Depending on how the packet is classified, thc rewrite 

policies, it may simply be set to the CoS or ToS or other field function rewrites either the packet CoS 24 or both thc CoS 

initially taken from the packet). 24 and the IP ToS 26 - If me P acket 1S c l asslfied ° a the basis 

_ ' . *. / .. . , of an IP ACE, both the CoS and the ToS are rewritten. 

There are three ways that user policy can control the M Notethattopa ^ tegotag0Utana0Mportorwhertlhe 

classification of a packet. packet>s yLAN fe (he native of a ^ pQrt> ^ 

1. By specifying the codepoint for a port (e.g., a particular pac k et raay be transmitted without an ISL or 802.1q header, 
hardware port of the device, a device input subnet or a j D this case the CoS value is lost once the packet leaves the 
device input VLAN); device. However, the CoS value is still used internally by the 

2. By specifying the codepoint for packets with a specific 25 device in performing the scheduling function. 

MAC destination address in a specific VLAN; and Scheduling includes: the process by which the device 

3. By specifying the codepoint for packets matching a P^ks a n output queue for thc packet, the policy for dropping 
specific ACE. packets when the queue exceeds some threshold (tail drop 

» . >, r j u j- * * (dropping packets while the threshold is exceeded), RED 

In accordance with a presently preferred embodiment of detection) , elc .) and ^ ^ 0lilhm f 0 / serv i C - 

the present invention, the algorithm for determining the ^ ^ g in * WR (weighted 

codepoint of a packet consists of three distinct steps. round robiQ)) ctc } Input tQ lhe schedulmg mctudes 

First, the packet is classified on the basis of the input port. user policies that specify queue and scheduling configura- 

If the port is a trunk port (carrying traffic from multiple tion; user policies that map codepoints to queues; and the 

VLANs) and the packet has an ISL or 802.1q header then the 35 codepoint that was the output of the rate limitation function, 

classification of the packet is the CoS of the packet If the i.e., the packet's current codepoint. The packet is enqueued 

port is a trunk port and the packet does not have an ISL or on the appropriate queue or (perhaps randomly) dropped if 

802. lq header, or the port is an access port (carrying traffic the rate exceeds the CAR for this codepoint. 

for a single VLAN), then the classification of the packet is The processing of the packet is diagrammed at FIG. 2. 

set to the CoS value configured on that port. Each port is ^ The first operation after arrival of the packet at block 60 is 

configured with a single CoS value. preferably an input queue scheduling process 61 where 

Second, a check is made to see if a CoS has been packets can be dropped (at reference numeral 62) at the input 
explicitly configured for the packet's MAC destination to the device under congestion conditions if an input thresh- 
address. If it has then the packet is assigned the CoS old based upon the codepoint is exceeded. The next opera- 
configured for that address replacing the previously assigned 45 tion is preferably input classification 63 since it is not 
CoS. generally possible to do any of the other functions before the 

Third, a check is made to see if it matches any of thc packet has been classified. In classification the codepoint 

configured ACEs. If the packet matches a configured ACE (^om which may be derived the ToS and/or CoS) of the 

then the packet is assigned the CoS corresponding to that packet is determined. The codepoint is determined for all 

ACE, replacing the previously assigned value. Once a 50 packets even if they are not packets which normally include 

matching ACE is found, no others arc checked. Thus, the ToS and/or CoS fields and these ToS/CoS values are used 

order of thc checking of the ACEs matters or a mechanism internally to the device. 

is required to resolve multiple matches. Immediately after input classification is input rate limi- 

Rale limitation or traffic restriction is the process by tation 64 where at block 65 the packet may be dropped or its 

which the switch limits the bandwidth consumed by an 55 codepoint altered based upon stored policies configurable by 

individual flow or a group of flows counted together as an an administrator of the system. For example, if an out-of- 

"aggregate". Rate limitation consists of two stages. The first P rofiIe P acket arrives, it may be dropped, or its codepoint 

stage determines if a given packet is in profile or out of may be altered to make it more likely to be dropped down 

profile. That is, whether or not the packet causes the flow (or the line. 

aggregation of flows) to exceed its allotted bandwidth eo Following input rate limitation is a forwarding decision 

(CAR) or not. An "in profile" packet is a packet that does not 66. The forwarding decision 66 is not a part of the QoS, but 

cause the CAR of the packet's flow to be exceeded. An "out it determines thc output port of the device to use which, in 

of profile" packet is the converse. The second stage of rate this general model, is a parameter to thc output queue 

limitation assigns an action to apply to packets that are out scheduling process 70 discussed below, 

of profile. This action may be either to reassign the packet 65 Following this is the Packet Rewrite operation 68 where 

a new codepoint, or to drop the packet. Input to thc rate the CoS and/or ToS or other field of the packet is rewritten 

limitation function includes: (1) user policies in terms of if appropriate and desirable. 
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Next, output queue scheduling 70 is performed and the queue will be dropped well before high congestion is 

packet is directed to an appropriate queue of the previously experienced on the data communications network. On the 

selected output port based upon the codepoint determined other hand, a much higher value for Thj is appropriate for 

for the packet in the classification operation or the packet's mission critical frames — for example: do not drop until the 

subsequently altered codepoint. The threshold for the output 5 queue is 100% full. 

queue is also selected here. . . 

Finally, at 72 the packet is sent to the next node in the data In efi f l > th ? s ^ allows for « rtam high pnonty traffic 

communications network. * S et thro ^ at ex P ense of <^ er ^ m » **™x 

T u . ... . * . . 4 _ / havmg multiple output queues and/or multiple thresholds 

In an alternative embodiment, the device output port (or it * f . rtrn 

subnet or VLAN) could be a parameter to the classification " rat ^. than bemg SUbj6Cted l ° * pUre FIF ° P racessin S 

function and, thus, a second classification function and a tno a y. 

second rate limitation operation could be applied after the The output queue select (OPQSEL) value derived from 

forwarding decision. the codepoint determines the queue selected for a multiple 

Output scheduling depends upon the capabilities of the queue environment. For example, one might want to assign 

output port. Most prior art ports are FIFO (first in, first out) 15 a relatively high priority to frames carrying voice informa- 

ports. Such ports are not capable of output scheduling in tion such as in IP telephony. This would minimize dropouts 

accordance with all aspects of the present invention. In and pauses and make the conversation appear to be occur- 

accordance with one presently preferred embodiment of the ring essentially in real time without significant delays, 

present invention, an output port having two queues each Similarly, IP video frames might be handled in the same or 

with two configurable WRED (weighted random early 20 a similar way. Alternatively, one might assign the high 

detection) classes is used. WRED is well known to those of priority queue to users paying an additional fee to obtain the 

ordinary skill in the art. Each CoS is mapped as desired to use of the high priority queue. Those of ordinary skill in the 

one of those WRED classes. For each class there is prefer- art will realize that many possibilities are capable of being 

ably a low threshold Tht and a high threshold Thj. The low implemented with the present invention, 
value Th, specifies the average depth of the queue below 

which packets that map to that threshold's class will not be Detailed Implementation 
dropped. The high value Th 2 specifies the average queue 

depth above which packets will be dropped with probability Content addressable memories ("CAMs") are well known 

1. For average queue depths between the low and high ^ to those of ordinary skill in the art. Such memories are 

values packets are randomly dropped. This is shown in FIG. typically a fixed number of bits wide and a fixed number of 

3 for one class. It is possible to set the high and low values addresses long. For example, a CAM might be 80 bits wide 

for each threshold to be the same or nearly the same. The by 8K(8192) addresses long. A binary CAM would include 

result is a queue with four thresholds with tail drop or near at each bit position a capability of comparing a data vector, 

tail drop performance when a threshold is hit. Tail drop say 80 bits long, against a programmed content of the 

means dropping all packets while the threshold is exceeded particular address. In a binary CAM, the data vector would 

and no packets while the threshold is not exceeded. simply be compared binary bit for binary bit to the binary 

In accordance with another presently preferred embodi- contents of the given address and a determination would be 
ment of the invention, packets are queued for transmission rendered as to whether a match existed or not. A ternary 
in one of a plurality of output queues. For example, two ^ CAM or "TCAM" adds a capability of comparing its con- 
queues could be used, for example, a high priority queue and tents not only to a data vector having 0 or a 1 at each bit 
a low priority queue, or a voice queue and a data queue. position but also to a bit position having a wild card or 
Many queues could also be used to give more range to the "don't care" condition usually represented by an "x". Thus 
priority processing of packets. if a TCAM entry having a data vector {0, x} representing 0 

In accordance with a presently preferred embodiment of 45 ^ thc lcft bit position and "don't care" in the right bit 

the invention, each queue has a fixed depth or memory position is compared to an input data vector having the value 

capacity. Variable depth queues could be used as will be i 0 - 1 } thcrc wil1 bc a match. There will also be a match if the 

recognized by those of ordinary skill in the art. Each queue "!P ut data vcctor has »he value {0,0}. However, the values 

also has associated with it at least one threshold, the value {1,0} and {1,1} for thc input data vector would both yield 

of which is programmable. so a no match condition. In certain types of addressing 

As presendy preferred, a WRED (weighted random early « rtain ^ •"J™ meaningful than other bits, 

detection) algorithm may be used to determine the probabil- thls *Mity to have a don t care selection (in effect, to 

ity of dropping a packet based upon the state of fullness of mask ccrtam blts ) can bc vcrv uscfilL 

its queue. For example, in a queue having two thresholds A method of using a TCAM (or CAM) is to take a data 

Thj and Th 2 (see FIG. 3) for Tr^Thi, the more full the 55 vector and test it sequentially against each address of the 

queue is, over a period of time, and past a particular TCAM until a match is found, then to use the address of the 

threshold such as Th lt the more likely a packet is to be match to index to a location in memory containing an 

dropped. The purpose here is to protect the higher priority appropriate response to the match condition. Another 

traffic. Suppose that there is high priority traffic such as method is to apply the data vector essentially simultaneously 

traffic used to control and regulate the operation of thc eo to all addresses in the TCAM or CAM and to index off of a 

communications network. If such traffic could not gel match, if any are found. In case of multiple matches, a 

through to its destination, the network might fail. Thus it is method of resolving the multiple match is required, 

desirable to set the threshold of other traffic so that it is Preferably, the first match is used and the rest of the entries 

dropped well before thc time that thc network becomes so are ignored to provide priority to thc first match. A match is 

congested that high priority traffic is at risk. 65 always guaranteed in accordance with a presently preferred 

By selecting a relatively low value for Th, for the low embodiment of thc present invention by providing a default 

priority queue, the low priority traffic in the low priority match for instance where no other match is found. 
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Every frame passing through the device is preferably the leak rate for a significant period of time, the counter 

checked simultaneously against two tables: value will grow. 

(1) an 13 table implemented using a netflow switching In a presently preferred embodiment of the present 
content addressable memory (CAM); and invention, the actual computation of the value of the bucket 

(2) an aggregate table using an access control list CAM 5 is made only when a flow hit occurs. Then the bucket count 
preferably implemented as a ternary CAM (TCAM). is decremented by the difference between the current time 

The netaow switching CAM has associated with each and the last seen time in time units multiplied by the leak rate 

entry (corresponding to each active flow) a packet counter, (p« unit time) and incremented by the number of bytes in 

a number of bytes seen counter, a token bucket count, and a *e fra me thai had the flow hit. 

contract value in terms of rate and bucket size. A token 10 FIGS. 4A, 4B and 4C are a block diagram of the apparatus 

bucket algorithm is employed on each flow to determine for a q ualit y of service policy in accordance with a presently 

whether packets are in or out of profile and/or what threshold preferred embodiment of the present invention. 

(OPQTH) to assign. All updates to the CAM arc preferably Turning now to FIG. 4A, the packet enters on line 73. At 

done in hardware. The default OPQTH value can be over- hasfl block 74 a hash index is obtained in a conventional 

ridden for solicited bandwidth reservations (e.g., RSVP is manner - The hash index is used to access the Layer 3 table 

flows) only. 0^ Table) 76 which may preferably be implemented in 

The Access Control List CAM (ACL CAM) preferably RAM (random access memory). Hash block 74 together 

provides single clock cycle accesses when performing a with L3 table 76 form L3 CAM 78. The packet's flow is 

match check for each packet. This leaves plenty of band- compared to active flows existing in the L3 table 76. If a 

width to perform an additional QoS lookup based upon the 20 match is found, i.e., the packet is part of an active flow, then 

best match for the current packet. The best match is order to statistics fields corresponding to the flow and stored in 

dependent for the entry in the ACL CAM, and may represent L3 table 76 are accessed, [f no match is found, then the L3 

any field in the packet upon which the administrator wishes table 76 is updated to reflect the new flow. These statistics 

to base rate limitation decisions. More than one CAM entry fields may include, for each active L3 flow, a packet counter, 

can produce the same n-bit CAM index. The n-bit CAM 25 a number of bytes seen counter, a token bucket and a 

index selects one of 2 N internal hardware counters, and contract value. If the flow is not an active flow, i.e., there is 

associated preconfigured contract levels, which become no entry corresponding to the packet^ flow in the L3 Table, 

affected by the packet statistics. The same or a similar token ^en a default is preferably used. Defaults may be set by the 

bucket algorithm applied in the netflow CAM counters is System Administrator. 

applied on these counters, allowing aggregation of traffic to 30 The packet is also routed from line 73 to a pair of TCAM 

be processed in parallel. The processing results from the lookup operations. The first type of TCAM lookup 80 is an 

netflow CAM and the aggregate counters are combined to aggregate table lookup which provides an index to the 

produce a final new codepoint or drop decision for the Aggregate Table 84 and returns a two-bit priority code on 

current packet. Because this QoS approach is applied at the hne 86 for combining the two ToS values. For example, the 

hardware level, it can run at the line rate of the system and 35 2-bit priority code can indicate how to handle conflicts, e.g., 

avoid any effect on the overall switching performance of the "use the lowest threshold of the two ToS values", or another 

system. scheme could be used as will now be clear to those of 

Potentially a match will be found in both tables (the L3 ordinary skill in the art. 

table and the aggregate table) based upon two independent The second type of TCAM lookup 82 is an L3 Table 

match criteria. As pointed out above, the aggregate table will 40 lookup. For each frame a TCAM L3 table lookup 82 is 

always produce a match with at least a default mask. Both performed and provides the contract value and token bucket 

tables maintain a last-seen timestamp and a token bucket. counter indirectly through an index that in a preferred 

When a match occurs, the two independent bucket counts embodiment selects 1 of 64 choices. When hardware creates 

are examined to determine the frame's output queue an entry in the L3 table 76, it writes these parameters into the 

(OPQSEL) and output queue threshold (OPQTH). If either 45 L3 table 76 over line 89. Later when a frame matches the 

bucket count exceeds a corresponding contract value, two entrv > tfa ere are 2 sets of parameters provided: 

independent rate limitation decisions are made. Either of (1) one set of parameters provided by the L3 Table lookup 

these decisions may result in dropping or changing the 82 into the TCAM; and 

packet. Finally the two independent rate limitation decisions (2) a second set of parameters read from the L3 table 76. 

are resolved to produce the final rate limitation decision for 50 The CAM or TCAM 88 will be logically separated into a 

the frame. Layer 3 Table QoS policy area and an Aggregate QoS policy 

Token bucket algorithms are well known to those of area, 

ordinary skill in the art. The basic idea is to provide a The data from the TCAM L3 Table lookup 82 is applied 

method of averaging a value which may come in spurts, such as an input to MUX 90 on line 91 as is the current data from 

as a data transmission. In accordance with a presently 55 the L3 table on line 92. 

preferred embodiment of the present invention, a token A selection value on line 94 from the L3 Table 76 selects 

bucket algorithm is implemented with a counter for each whether to use the parameters from the TCAM L3 Table 

table entry in the aggregate table and the L3 table. The lookup 82 or the parameters from the L3 table on line 92. 

counter is incremented for each in-profile byte associated By default, the parameters coming from the TCAM L3 

with the flow passing through the system. A minimum byte so Table lookup 82 are used. The system can be told with 

increment may be enforced for short packets. The counter is software to use the parameters stored in the L3 Table 76 

decremented by a fixed number (the "leak rate") associated instead. This approach is desirable when the parameters 

with the passage of a given amount of time. The leak rate have been modified by software intervention. The L3 Table 

corresponds to a contract value. This has the effect that the parameters may be initially set by software prior to flow 

value stored in the counter will not grow over time as long 65 existence or overridden by software. The L3 Table initially 

as the leak rate exceeds or equals the data throughput rate for learns its parameters by performing TCAM L3 Table lookup 

the flow. However, where the data throughput rate exceeds 82 into the TCAM 88. 
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The selected information include the contract value and is Similarly, the calculated ToS on line 118 is applied to 

applied over line 96 to the L3 table rate limitation block 98. MUX 132 where, if the "DO NOT CHANGE" signal 126 is 

A token bucket is operated as discussed above over line 100 asserted, the original ToS value "ORIGTOS" on line 134 is 

with the L3 table 76. The outputs of L3 table rate limitation passed as ToS F on line 136, otherwise the value of ToS on 

block 98 include "L3TOS", an 8-bit representation of the 5 ii nc U8 is passed as ToS F on line 136. 

calculated ToS for the packet, "L3DROP", a value indicating Finally, L3DROP and AGDROP are combined and 

whether or not to drop the packet based upon it being out of reS olved as follows. The two-bit priority value, L3DROP 

profile or not, and L3 ALTER, a one-bit value indicating ^ A GDROP are applied to a 4:1 programmable decoder 

whether or not to alter the codepoint of the packet. U8 t0 obtain a doping decision in accordance with a 

The aggregate table side operates similarly. The bank of w mmablc policy . Prc f crab l y the priority value is used to 

aggregate counters used for token bucket analysis is pre- UDRQp Qr AGDROp 0tbcr , icics cou]d ^ bc 

configured with the codepoint and the token bucket param- med such for k „ al usc u DR0R « 

eters to use. The priority is not stored, allowing different ±L & ' ' *v ' j . 

policies to map to the same aggregate counter (several ™ c ™* » ^ °f ^ 140 * ^ su f a f. a 

matches may map to the same aggregate counter index, with ^T^n Sn^il^™ T ^ "S?. 00 u 

different priorities for resolving which ToS to use, depend- « 140 with a "DO NOT CHANGE signal on fine 126 to yield 

ing upon the actual flow. a si e nal DROP F on ^ 144 wmch follows the signal on line 

The TCAM aggregate table lookup 80 into TCAM 88 140 unless "DO NOT CHANGE" is asserted, whereupon the 

provides an index on line 101 used to access the Aggregate value of the signal on line 144 is set to "DO NOT DROP/' 

Tabic 84. The contract value and token bucket counter are FIGS. 5A, 5B, 5C and 5D are a flow chart detailing an 

used in aggregate table rate limitation 102 to produce 20 implementation of a presently preferred embodiment of the 

AGTOS, the ToS based upon the aggregate table processing present invention. At reference numeral 200 the process 

branch 220 of FIG. 5A, "AGDROP", the dropping decision starts with the arrival of a packet at reference numeral 202 

based upon branch 220, and "AGALTER", a one-bit value at a node of the communications network- For packets 

indicating whether or not to alter the codepoint of the packet. having a CoS field and/or a ToS field, this information is 

The packet processing described herein is based upon the 25 extracted at reference numeral 204. Optionally, at reference 

DS/ToS definition. If a valid ToS/CoS is not available, e.g., numeral 206 it is possible to bypass some or all of the packet 

for a non-802.1q and non- IP packet, a working value is processing if the packet came from a "trusted source", that 

derived from other sources for internal use as discussed is, one that is already implementing a similar process in 

above. For legacy ToS definitions i.e., the present ToS accordance with the policy-implemented by the network 

definition), the precedence bits from the ToS are mapped 30 administrator. Where the packet comes from a trusted source 

into DS/ToS values with a conventional mapping. For (as can be detected by knowing the physical port of the 

frames that are not IP, the 3-bit CoS field is mapped into an device that it arrived on) then a full bypass or partial bypass 

8-bit ToS field with a conventional mapping. This approach can be implemented. In a full bypass, as at reference numeral 

is also applied if the DS/ToS field of an incoming IP frame 208, control is shifted to reference numeral 210 in FIG. 5D, 

is assumed to be invalid for some reason. 35 discussed below. In a partial bypass, as at reference numeral 

The ToS remap takes any input ToS and maps it to a final 212, control is shifted to reference numeral 214 in FIG. 5B. 

AGTOS or L3TOS. It is configured by software. The mean- This is also discussed below. 

ing of the various possible values of the 8-bit ToS may be set If the packet is not from a trusted source, or if bypassing 

by software as desired. is not implemented, control is passed to reference numeral 

Turning now to FIG. 4B, a method and apparatus for 40 216 in FIG. 5A. At reference numeral 216 the packet is 

combining certain bits of L3TOS and AGTOS into a result- parsed for its micro flow. In this process, the pertinent part 

ing one-bit TOSL3AG value in accordance with a presently of the flow is extracted for use in accessing the CAMs 

preferred embodiment of the present invention is shown. A associated with the Layer 3 Table and/or the Aggregate 

programmable compare mask 104 is used to mask bits which Table. 

will not be used in the comparison. Then the two masked 45 Now, in accordance with a presently preferred embodi- 

signals are applied to a comparing MUX 106 — providing a ment of the present invention, control passes in parallel 

one-bit indication of which value is larger. along branches 218 and 220 proceeding from to reference 

Turning now to FIG. 4C, a method and apparatus for numeral 216. Branch 218 processes information using the 

resolving L3TOS, L3DROP, L3ALTER, AGTOS, AGDROP Layer 3 Table approach discussed above. Branch 220 pro- 

and AGALTER using the two-bit priority value "priority" so cesses information using the ACL CAM/Aggregate Table 

from FIG. 4A is shown in accordance with a presently approach discussed above. While it is preferred to do both in 

preferred embodiment of the present invention. parallel, either can be used exclusively and is still within the 

TOSL3AG, L3 ALTER, AGALTER and the two-bit pri- inventive concepts disclosed herein, 

ority value are applied to a programmable 5:1 decoder 106. Following branch 218, the micro flow is compared to the 

Using a selected mechanism to resolve the various inputs (it 55 entries in the Layer 3 Table at reference number 222. The 

would be as simple as "always choose L3TOS"), a bit on closest match will result in obtaining either directly, or 

select line 108 to MUX 110 chooses L3TOS or AGTOS through a pointer, the OPQSEL (output queue select) and 

which is then provided on line 112. Optionally certain bits OPQTH (output queue threshold) values for the micro flow 

of the original ToS ("ORIGTOS") may be passed through (assuming that the micro flow has been seen recently and is 

and used to override the value on line 112 using bit mask 114 60 therefore contained in the Layer 3 Table). In accordance with 

and MUX 116. The output of this process on line 118 is a presently preferred embodiment of the invention, the 

applied to 8:3 CoS Mapping 120 which results in a 3-bit OPOSEL can be either 0 or 1 representing two output queues 

output on line 122. This is in turn, optionally applied to and the OPQTH can be 0, 1, 2 or 3 representing four levels 

MUX 124 where, if the "DO NOT CHANGE" signal 126 is of threshold. The three-bit CoS value is simply the OPQSEL 

asserted, the original CoS value "ORIGCOS" on line 128 is 65 bit and the two OPQTH bits. This value is sent to the port 

passed as CoS F on line 130, otherwise the value of CoS on to control output queue selection and threshold. Control is 

line 122 is passed as CoS f on line 130. then transferred at 224 to reference numeral 214 of FIG. 5B. 
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Similarly, following branch 220, the micro flow is masked 
at reference numeral 226 and compared to the entries in the 
ACLCAM/Aggregate Table. Preferably, the first match is 
reported and an OPQSEL and OPQTH value derived there- 
from. At reference numeral 228, control is transferred to 
reference numeral 214 of FIG. 5B. 

The process starting at reference numeral 214 is per- 
formed for both branch 218 and branch 220 separately. 

If the policy is so set that rate limiting is in effect at 



ToS values (L3 ToS and AG ToS) derived from branch 218 
and branch 220, respectively, are compared and resolved as 
discussed above in conjunction with the discussion of FIGS. 
4B and 4C. Control is then passed to reference numeral 258 
where a drop/no drop decision is made based upon policy, 
only if BYTECOUNT 2 is greater than the contract value 
associated with the packet flow. If the decision is made to 
drop the packet, it is simply forwarded to no port at reference 
numeral 260, otherwise control passes to reference numeral 



reference numeral 230, control transfers to the token bucket 10 210 and from there to reference numeral 262 where the 

process starting at reference numeral 232. Otherwise, at packet is sent to a selected output port. At reference numeral 

reference numeral 234, control is transferred to reference 264 the final ToS is sent to the output port. At reference 

numeral 236 of FIG. 5D. numeral 266 the port selects the output queue based upon the 

The token bucket works as follows. At reference numeral ToS/CoS. At reference numeral 268 the port optionally 

232 a byte count denoted "BYTECOUNT 1" is read from 15 implements WRED on the selected queue. At reference 

the data store associated with the L3 table or the aggregate numeral 270 if the packet is an IP packet, control may be 

table. BYTECOUNT 2 is set to BYTECOUNT 1+MAX optionally transferred to reference numeral 272 so that the 

(MINBYTES, PKTBYTES), that is to say that the byte packet's DS/ToS field 26 may be rewritten to incorporate the 

counter is set to be incremented by the larger of the number calculated ToS. At reference numeral 274 if the packet has 

of bytes in the present packet or some minimum number of 20 an 802. lq tag and CoS field, control may be optionally 

bytes which will be attributed to small packets. This is done transferred to reference numeral 276 so that the packet's 

to take into account the fact that small packets have a larger 802. lq CoS field 24 may be rewritten to incorporate the 

real overhead to the communications network than their calculated CoS. Optionally the CoS field may be incorpo- 

respective byte counts would tend to indicate, thus they are rated into the packet with ISL encapsulation where it can be 

treated as if they have an artificially larger number of bytes. 25 used downstream. The process is complete at reference 

This process is optional. BYTECOUNT 3 is set to the numeral 278. 
original value of BYTECOUNT 1 to hold it for future use 

detailed below. Alternative Embodiments 

Once the byte count is determined at reference numeral Although illustrative presently preferred embodiments 

232, control transfers to reference numeral 238. At reference 30 and applications of this invention are shown and described 

numeral 238, a determination is made as to whether the herein, many variations and modifications arc possible 

minimum time stamp interval has elapsed since the last which remain within the concept, scope, and spirit of the 

packet was processed which matches the characteristics of invention, and these variations would become clear to those 

the micro flow being processed. If not, reference numeral of skill in the art after perusal of this application. The 

240 is skipped. If the minimum interval has elapsed, then 35 invention, therefore, is not to be limited except in the spirit 



reference numeral 240 decrements BYTECOUNT 2 by the 
leak rate (LEAKRATE) multiplied by the elapsed time (# 
intervals). Hence that value is the leak rate per unit interval 
multiplied by the number of intervals elapsed based upon the 
last seen timestamp and current time stamp values. The last 40 
seen time stamp is preferably stored in the pertinent table 
with the pertinent micro flow information. After reference 
numeral 240, control passes to node D, 242 and then to 
reference numeral 244. 

At reference numeral 244 BYTECOUNT 2 is compared 45 
to the contract value for the flow read from the appropriate 
data store. If BYTECOUNT 2 exceeds the contract value 
then the packet is out of profile and control passes to 
reference numeral 246 if not, the packet is in profile and 
control passes to reference numeral 248. 50 

At reference numeral 248 BYTECOUNT 1 in the data 
store associated with the table is updated to the value of 
BYTECOUNT 2. Control then passes to node C, 236 via 
reference numeral 250. At reference numeral 246, since the 
packet is out of profile, the BYTECOUNT 1 value in the 55 
data store associated with the table is updated for leak rate 
but is not charged for bytes associated with the packet. 
Hence, BYTECOUNT 1 » BYTECOUNT 3 — (LEAK RATE 
* # INTERVALS). Control then passes to reference numeral 
252 where a value for the codepoint (ToS/CoS) is deter- 60 
mined. This value will preferably incorporate an output 
threshold (OPQTH) which increases the likelihood that the 
packet will be dropped in various congestion situations, as 
it is out of profile. Control then passes to node C, 236 via 
reference numeral 254. 65 

Turning now to FIG. 5D, from reference numeral 236, 
control passes to Sao reference numeral 256 where the two 



of the appended claims. 
What is claimed is: 

1. An apparatus for implementing a quality of service 
policy, comprising: 

a packet input; 

a first flow information extractor having a single mask, 
said mask extracting specified flow information from 
packets appearing at said packet input; 

a first content addressable memory (CAM) containing 
entries corresponding to active packet flows passing 
through the apparatus, said first CAM responsive to 
said specified flow information to provide first quality 
of service parameters; 

a second flow information extractor having multiple 
masks, said multiple masks extracting aggregate flow 
information from packets appearing at said packet 
input; and 

a second CAM containing entries corresponding to user- 
configurable aggregations of packet flows passing 
through the apparatus, said second CAM responsive to 
aggregate flow information to provide, in conjunction 
with an aggregate flow table, second quality of service 
parameters. 

2. The apparatus of claim 1, further comprising: 

a first rate limiter associated with said first CAM, said first 
rate limiter generating at least a first codepoint for each 
packet appearing at said packet input; and 

a second rate limiter associated with said second CAM 
and said aggregate flow table, said second rate limiter 
generating at least a second codepoint for each packet, 
appearing at said packet input. 
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3. The apparatus of claim 2, further comprising: 
a oodcpoint rcsolvcr for applying pre -configured policy 

inputs to said first codepoint and said second codepoint 
to generate a final codepoint for each packet appearing 
at said packet input. 

4. The apparatus of claim 3, further comprising: 
a packet modifier responsive to said final codepoint for 

modifying CoS fields of 802.1 q and ISL packets prior 
to transmission from the apparatus. 

5. The apparatus of claim 3, further comprising: 
a packet modifier responsive to said final codepoint for 

modifying ToS/Differentiated Service fields of IP pack- 
ets prior to transmission from the apparatus. 

6. The apparatus of claim 3, further comprising: 
an output port having a plurality of output queues, one of 

said output queues being selected based upon said final 
codepoint. 

7. The apparatus of claim 6, wherein an output queue 
threshold value for said selected output queue being selected 20 
based upon said final codepoint. 

8. The apparatus of claim 7, further comprising: 
means for dropping at least some packets of said selected 

output queue when said output queue threshold value is 
exceeded by an average queue depth at said selected 
output queue. 

9. A method for implementing a quality of service policy, 
the method comprising: 

receiving packets at a packet input; 

extracting specified flow information from packets 
appearing at the packet input; 

using the specified flow information, determining a match 
with an entry in a first content addressable memory 
(CAM), the first CAM containing entries correspond- 3S 
ing to active packet flows, the first CAM responsive to 
the specified flow information to provide first quality of 
service parameters; 

extracting aggregate flow information from packets 
appearing at the packet input; and 40 

using the aggregate flow information, determining a 
match with an entry in a second CAM associated with 
an aggregate flow table, the second CAM containing a 
plurality of entries, each of the entries corresponding to 
user-configurable aggregations of packet flows and 
containing second quality of service parameters. 

10. The method of claim 9, further comprising: 
generating at least a first codepoint for each packet 

appearing at the packet input, the at least a first code- 
point being associated with the first CAM; and 
generating at least a second codepoint for each packet 
appearing at the packet input, the at least a second 
codepoint being associated with the second CAM. 

11. The method of claim 10, further comprising: 
applying pre-configured policy inputs to the at least a first 

codepoint and the at least a second codepoint to gen- 
erate a final codepoint for each packet appearing at the 
packet input. 

12. The method of claim 11, further comprising: 
modifying CoS fields of 802. lq and ISL packets prior to 

transmission. 

13. The method of claim 11, further comprising: 
modifying ToS/Differenliated Service fields of IP packets 

prior to transmission. 
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14. The method of claim 11, further comprising: 
selecting an output queue based upon the final codepoint. 

15. The method of claim 14, further comprising: 
based upon the final codepoint, selecting an output queue 

threshold value for the selected output queue. 

16. The method of claim 15, further comprising: 
dropping at least some packets of the selected output 

queue when the output queue threshold value is 
exceeded by an average queue depth at the selected 
output queue. 

17. An apparatus for implementing a quality of service 
policy, the apparatus comprising: 

a packet input for receiving packets; 

means for extracting specified flow information from 
packets appearing at the packet input; 

means, using the specified flow information, for deter- 
mining a match with an entry in a first content addres- 
sable memory (CAM), the first CAM containing entries 
corresponding to active packet flows, the first CAM 
responsive to the specified flow information to provide 
first quality of service parameters; 

means for extracting aggregate flow information from 
packets appearing at the packet input; and 

means, using the aggregate flow information, for deter- 
mining a match with an entry in a second CAM 
associated with an aggregate flow table, the second 
CAM containing a plurality of entries, each of the 
entries corresponding to user-configurable aggrega- 
tions of packet flows and containing second quality of 
service parameters. 

18. The apparatus of claim 17, further comprising: 
means for generating at least a first codepoint for each 

packet appearing at the packet input, the at least a first 
codepoint being associated with the first CAM; and 
means for generating at least a second codepoint for each 
packet appearing at the packet input, the at least a 
second codepoint being associated with the second 
CAM. 

19. The apparatus of claim 18, further comprising: 
means for applying pre-configured policy inputs to the at 

least a first codepoint and the at least a second code- 
point to generate a final codepoint for each packet 
appearing at the packet input. 

20. The apparatus of claim 19, further comprising: 
means for modifying CoS fields of 802. lq and ISL 

packets prior to transmission. 

21. The apparatus of claim 19, further comprising: 
means for modifying ToS/Differentiated Service fields of 

IP packets prior to transmission. 

22. The apparatus of claim 19, further comprising: 
means for selecting an output queue based upon the final 

codepoint. 

23. The apparatus of claim 22, further comprising: 
means, based upon the final codepoint, for selecting an 

output queue threshold value for the selected output 
queue. 

24. The apparatus of claim 23, further comprising: 
means for dropping at least some packets of the selected 

output queue when the output queue threshold value is 
exceeded by an average queue depth at the selected 
output queue. 
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ABSTRACT 



A packet filter for a router performs generalized packet 
filtering allowing range matches in two dimensions, where 
ranges in one dimension at least one dimension is defined as 
a power of two. To associate a filter rule with a received 
packet EP, the packet filter employs a 2-dimensional interval 
search and memory look-up with the filter-rule table. Values 
of s m of filter-rule ^-(s^d^) in one dimension are desirably 
ranges that are a power of two, such as prefix ranges, which 
are represented by a binary value having a "length" defined 
as the number of bits to of the prefix. The d m may be single 
points, ranges defined as prefix ranges, and/or ranges defined 
as continuous ranges. The packet filter employs preprocess- 
ing of the filter-rules based on prefix length as a power of 2 
in one dimension and decomposition of overlapping seg- 
ments into non-overlapping intervals in the other dimension 
to form the filter-rule table. A preprocessing algorithm 
searches in one dimension through filter rules and arranges 
the corresponding filter-rule rectangle segments according to 
prefix length. Then, in the other dimension, the overlapping 
filter rectangle segments are decomposed into non- 
overlapping intervals, and the highest priority filter-rule 
overlapping each non-overlapping interval is associated 
with that interval. A filter-rule table is then constructed with 
entries ordered according to prefix length and non- 
overlapping interval, each entry associated with a particular 
filter-rule. Apacket classification algorithm then matches the 
field or other parameter information in the packet to the 
filter-rule table entries to identify the filter-rule rectangle 
associated with the filter-rule to be applied to the packet. 

30 Claims, 9 Drawing Sheets 
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PACKET CLASSIFICATION METHOD AND 
APPARATUS EMPLOYING TWO FIELDS 

CROSS-REFERENCE TO RELATED 

APPLICATIONS 5 

This application claims the benefit of the filing date of 
U.S. provisional application No. 60/073,996, filed on Feb. 9, 
1998. 

BACKGROUND OF THE INVENTION 10 

1. Field of the Invention 

The present invention relates generally to packet forward- 
ing engines used in telecommunications, and, in particular, 
to router algorithms and architectures for supporting packet 15 
filter operations using two packet fields. 

2. Description of the Related Art 

Packet-based communication networks, such as the 
Internet, typically employ a known protocol over paths or 
links through the network. Commonly known protocols arc, 20 
for example, Transmission Control Protocol/Internet Proto- 
col (TCP/IP) or Reservation Set-up Protocol (RSVP). Rout- 
ers provided in a communication network provide a packet 
forwarding function whereby input data, usually in the form 
of one or more data packets, is switched or routed to a 25 
further destination along a network link. FIG. 1 shows a 
typical form of a data packet 20, which may be of variable 
length. Data packet 20 comprises, for example, a header 125 
and payload data 150. Header 125 contains fields or 
parameters, such as a source address 130 where the data 30 
originates and at least one destination address 135 where the 
data is to be routed. Another parameter in the header 125 
may be a protocol type 140 identifying a particular protocol 
employed in the communication network. 

FIG. 2 shows a router 245 of a network node receiving 
streams or flows of data packets from input links 247 and 
routing these packet streams or flows to output links 260. To 
perform a forwarding function, router 245 receives a data 
packet at an input link 247 and a control mechanism 250 ^ 
within the router utilizes an independently generated look- 
up table (not shown) to determine to which output link 260 
the packet should be routed. It is understood that the packet 
may first be queued in buffers 252 before being routed, and 
that the forwarding function is desirably performed at a high ^ 
rate for high forwarding throughput. 

Source and destination addresses may be logical 
addresses of end hosts (not shown). Thus, data packet 20 of 
FIG. 1 may further comprise unique source port numbers 
137 and destination port numbers 139. Header 125 may also 50 
include, for example, certain types of flags (not shown) in 
accordance with protocol type 140, such as TCP, depending 
upon the receiver or transmitter application. 

Network service providers, while using a shared back- 
bone infrastructure, may provide different services to dif- 5s 
ferent customers based on different requirements. Such 
requirements may be different service pricing, security, or 
Quality of Service (QoS). To provide these differentiated 
services, routers typically include a mechanism for 1) clas- 
sifying and isolating traffic, or packet flows, from different $0 
customers, 2) preventing unauthorized users from accessing 
specific parts of the network, and 3) providing customized 
performance and bandwidth in accordance with customer 
expectations and pricing. 

Consequently, in addition to the packet forwarding 65 
function, router 245 of FIG. 2 may perform a packet filtering 
function. Packet filtering may be employed, for example, as 
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"firewall protection" to prevent data or other information 
from being routed to certain specified destinations within the 
network. To perform packet filtering, the router 245 may be 
provided with a table or list of filter rules specifying that 
routing of packets sent from one or more of specified sources 
is denied or that specific action is to be taken for that packet 
having a specified source address. Such packet filtering may 
be employed by layer four switching applications. 

Specifically, packet filtering parses fields from the packet 
header 125 including, for example, both the source and 
destination addresses. Parsing allows each incoming packet 
to be classified using filter rules defined by network man- 
agement software, routing protocols, or real-time reserva- 
tion protocols such as RSVP. 

Filter rules may also specify, for example, that received 
packets with fields specifying that a particular destination 
address should or should not be forwarded through specific 
output links, or that some other specific action should be 
taken before routing such received packets. Thus, a variety 
of filter rules may be implemented based on packet field 
information. For example, such filter rules might be based 
on 1) source addresses; 2) destination addresses; 3) source 
ports; 4) destination ports; and/or 5) any combination of 
these fields. 

Packet filtering of the prior art generally requires either an 
exact match operation of the fields or a match operation 
defined in terms of field ranges for a filter rule. Field ranges 
may specify, for example, ranges of source addresses, des- 
tination addresses, source/destination port numbers, and/or 
protocol types. Filter rules are then applied to every packet 
that the router receives; that is, for each packet received by 
the router, every filter rule is successively applied to each 
packet to ascertain whether that packet is to be forwarded, 
restricted, or re-routed according to the filter rule. However, 
implementation of a large number of filter rules in a router 
(e.g. 500 or more) is time consuming with respect to 
processor execution time since all filter rules must be tested. 
Hence, routers implementing filters having a large number 
of filter rules have decreased throughput, compromising a 
quality of service (QoS). Thus, for a router such as router 
245 to maintain a relatively high level of throughput, the 
filtering function must be performed at very high rate. 

The IP packet header fields may contain up to 128 bits of 
parameter information, including source and destination 
addresses, physical source and destination port numbers, 
interface number, protocol type, etc. Each of the fields or 
parameters in the header may be represented as being along 
an axis of a dimension. The general packet classification 
problem of a packet filter may then be modeled as a 
point-location in a multi-dimensional space. One or more 
field values of a packet define a point in the multi- 
dimensional space. A packet filter rule associated with a 
range of values of each defines an object in the multi- 
dimensional space. 

A point-location algorithm in a multi-dimensional space 
with multi-dimensional objects finds the object that a par- 
ticular point belongs to. In other words, given a received 
point EP~{E. U E 2 , . . . E D } in a space having D dimensions, 
find one or more of a set of n D-dimensional objects 
including the point (n being an integer greater then 0). The 
general case of D>3 dimensions may be considered for the 
problem of packet classification. As is known in the art, the 
best algorithms optimized with respect to time or space have 
either an Oflog z>-1 n) time complexity with O(n) space or an 
O(log n) time complexity with 0(rr°) space, where O(-) 
mathematically represents "on the order of." Comparing 
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algorithms on the basis of the order of operations is par- second dimension; and the filter rule corresponding to the 

ticularly useful since operations may be related to memory solution interval is associated with the packet, 

requirements (space) and execution time (time complexity), i n accordance with another exemplary embodiment, the 

Though algorithms with these complexity bounds are filter-rule table is created by first assigning each filter-rule to 

useful in many applications, they are not currently useful for 5 one or more prefix values based on the values in the first 

packet filtering. First, packet filtering must complete within dimension; and then projecting, for each prefix value having 

a specified amount of time, which generally forces a value the same length, values of each corresponding filter rule of 

for n to be relatively small relative to asymptotic bounds, but the prefix value onto the second dimension to define at least 

routers typically filter packets with a number of filter rules one filter-rule segment. Each filter-rule segment is decom- 

in the range of a few thousand to tens of thousands. 10 posed into one or more non-overlapping intervals associated 

Consequently, even point-location algorithms with poly- with each prefix value having the same length and corre- 

togarithmic time bounds are not practical for use in a sponding filter rule in the second dimension; and a pointer 

high-speed router. is generated for each non-overlapping interval identifying 

For example, router 245 desirably processes n=lK filter each filter rule contained in the non -overlapping interval, 

rules of D»5 dimensions within 1 to sustain a 1 million- 15 The pointer is stored as an entry of the filter-rule table 

packets-per-second throughput. However, an algorithm associated with a prefix value length and a non-overlapping 

employed with O(log D ~ 1 a) complexity and 0(n) space has interval. 

a log 4 1024 execution time and 0(1024) space, which URTPP nFSmiPTION OF THF DRAWINGS 

requires 10K memory accesses (look-ups) per packet. If an BRIEF DESCRIPTION OF THE DRAWINGS 

0(log n) time 0(n 4 ) space algorithm is employed, then the 20 Other aspects, features, and advantages of the present 

space requirement becomes prohibitively large (greater than invention will become more fully apparent from the follow- 

1000 Gigabytes). ing detailed description, the appended claims, and the 

For the special case of two dimensions, the filter rules accompanying drawings in which: 

defined for field ranges are modeled as objects in two ^ FIG. 1 shows a typical form of a data packet of a 

dimensions, for example, forming rectangles in the communications network; 

2-dimensional space. For a 2Klimensional space having FIG. 2 shows a router of a network node receiving and 

non-overlapping rectangles, some packet filter algorithms f orwar ding packet streams; 

have logarithmic complexity and near-linear space complex- _ .„ t . , , . , _ t a ., . 

ity. However, these algorithms do not consider the special „ ™- 3 lUuslratovely depicts prefix ranges of a fie d in an 

problem related to arbitrary overlapping rectangles in the 30 ^"nension where the prefix ranges are a power of two; 

multi-dimensional space requiring a decision of which over- F1G - 4 illustratively depicts segments of a filter rule 

lapping filter rules to apply to a packet. The problem may be having one or more field ranges of destination addresses 

resolved through a priority of the longest field prefix. An projected as horizontal intervals; 

algorithm of the prior art where the time complexity is 35 FIG. 5 illustrates a 2-dimensional space for an exemplary 

0(log(log N)) is based on stratified tree searches in a finite packet filter in accordance with the first embodiment of the 

space of discrete values. Examples of these algorithms are present invention; 

discussed in, for example, M. De Berg, M. van Kre veld, and FIG. 6 illustrate steps of an exemplary pre-processing 

J. Snoeyink, Two- and Three-dimensional Point Location in algorithm in accordance with the present invention; 

Rectangular Subdivisions, Journal of Algorithms, ^ FIG. 7 illustrate steps of decomposing overlapping inter- 

18:256-277, 1995. Data structures employed by this prior va]s mtQ non ^ vcdipping imerV als as shown in FIG. 6; 

art algorithm require a perfect hashing operation in every „ .„ . . , A . 

level of the tree The pre-processing complexity, without , FIG U 8 ? Uustra *f slc P s .° u f an cxcm P 1 ^ c assificaUon 

using a randomized algorithm, of calculating the perfect al S ontbm m acco ^ance with the present invention; 

hash is 0(min(hV,n 3 ), where h is the number of hash FIG 9A illustrates an example of trie structure of an 

functions that must be calculated and V is the size of the exemplary embodiment employing virtual intervals to 

space. Consequently, for a 2-dimensional space, longest- reduce search time of a classification algorithm; 

prefix lookups may result in executions requiring 2 32 cycles, FIG. 9B illustrates an example of point propagation of an 

even for a relatively small number of filter rules, even if exemplary embodiment employing virtual intervals to 

pre-processing is only required once every several seconds. 5Q reduce search time of a classification algorithm; 

FIG. 10 illustrates a hardware system for implementation 

SUMMARY OF THE INVENTION of the packet filter in accordance with the present invention 

The present invention relates to a packet filter associating in a P acket forwariiD S or router; 

at least one filter rule with a packet, each filter rule and the FIG 11 shows a filter processor receiving incoming 

packet characterized by values in first and second 55 packets, storing field parameters and classifying a packet in 

dimensions, the filter rule to be applied to the packet by a accordance with the present invention; and 

router in a communications network. In accordance with an FIG. 12 shows an example memory organization of a 

exemplary embodiment, a filter-rule table is provided with filter-rule tabic for the system illustrated in FIG. 10, which 

each entry of the filter-rule table corresponding to a prefix depicts a filter-rule, 

value having a length in the first dimension and at least one 60 nrcTAir pn nncmumnM 

interval in the second dimension. Each prefix value match- DLJAlLbD DboLKlr 1 ION 

ing the value of the packet in the first dimension is identified, For exemplary embodiments of the present invention, a 

and each interval corresponding to identified prefix values packet filter associates a 2-dimensional filter rule with an 

containing the value of the packet in the second dimension arriving packet EP having fields S and D. For a unicast 

is retrieved. A solution interval is determined as the interval 65 forwarding packet filter, these values S and D may be source 

associated with the prefix value associated with a predcter- and destination address values, respectively, of the packet, 

mined metric and containing the value of the packet in the For a multicast forwarding packet filter, the value S may be 
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the source address value of a packet and D a group identifier 
(ID) that identifies the multicast group that the packet may 
be forwarded to. The value for S may be contained in a range 
of binary values s, s being associated with an axis in one 
dimension (the s-dimension). Similarly, the value for D may 5 
be contained in a range of binary values d, d being associated 
with another axis in another dimension (the d-dimension). 
The packet filter includes a set of n packet-filtering rules RP 
having 2 dimensional filter rules r 1 through r„ to be associ- 
ated with the packet. Each filter rule r m , m an integer greater 10 
than 0, may be denoted as r m ={s m ,d m }, which is a set of two 
field ranges s nt and d ffl in the s-dimension and d-dimension 
that define the filter rule i m in the 2-dimensional space. 

To associate a filter rule with a received packet EP, the 
packet filter employs a 2-dimensional interval search and 15 
memory look-up with the filter-rule table. Locating a pair of 
values S and D for fields of a packet EP and associating a 
2-dimensional filter rule with the packet may be modeled as 
a point- location problem in a 2-dimensional space. The 
packet EP having field values S and D arrives at the router 20 
and is defined as a query point (S, D) of a 2-dimensional 
space. For the point-location problem where packet filtering 
involves orthogonal rectangular ranges, a search in 
2-dimensions of a 2-dimensional, orthogonal, rectangular 
range decomposes each rectangle into a set of 1 -dimensional 25 
filter-rule intervals to allow 1-dimensional searches over 

1 - dimensional intervals. 

For a simple embodiment, preprocessing of filter- rules 
may construct the filter-rule table as a 2-dimensional look- 
up table comprising filter-rule pairs (s m ,d m ), m an integer 30 
greater than 0, where each s„ is a prefix of possible source 
addresses and each ^ is a contiguous range, or a single 
point, of possible destination addresses or group IDs. For the 
table, each pair (s m ,d,„) defines a filter-rule rectangle r m ={s m , 
d m } for the n packet-filtering rules Tj through r„ in 35 

2- dimensions, and rectangles may overlap. The point loca- 
tion in a 2-dimensional space operates as follows: given the 
query point (S, D) of packet EP, the search or look-up 
algorithm for packet classification finds an enclosing filter- 
rule rectangle r m =(s m ,d m ), if any, such that the query point 40 
(S, D) is contained in r m , and such that s m is the most specific 
filter according to a predefined metric, such as, for example, 
the longest matching prefix of field value S or the highest 
priority rule for a given prefix length. 

For Internet Protocol (IP) routers employing an algorithm 45 
in accordance with the present invention, look-up tables may 
have as many as 2 1 * entries or more. Also, algorithms 
employed may generally be evaluated based on worst-case 
performance since queuing for header processing is desir- 
ably avoided to provide a specific Quality of Service (QoS). 50 
For the exemplary filter-rule table, a value n may be defined 
to denote a number of entries in the table, for example a 
multicast forwarding table, corresponding to the n filter rules 
Tj through r n . An nxn array may be formed in a memory with 
each entry representing the highest-priority filter-rule reel- 55 
angle of the n filter rules t 1 through r n enclosing a point 
corresponding to the coordinates represented by the entry. 
An exemplary classification (i.e., look-up) algorithm that 
employs this simple table may employ two binary searches, 
one for each of the dimension. This exemplary classification 60 
algorithm may require 0(log n) time and 0(n 2 ) memory 
space. The O^n 3 ) memory space is due to one rectangle 
being represented in O(n) locations. Such simple table might 
not be preferred, however, for a high-speed router when the 
number of filtering rules is n»2 16 or greater since the 65 
required memory space or memory access time may be 
excessive. 



,130 Bl 

6 

Consequently, preferred embodiments of the present 
invention employ preprocessing of the filter-rules based on 
prefix length as a power of 2 in one dimension and decom- 
position of overlapping segments into non-overlapping 
intervals in the other dimension to form the filter-rule table. 
A packet filter of the present invention first searches in one 
dimension through filter rules and arranges the correspond- 
ing filter-rule rectangle segments according to prefix length. 
Then, in the other dimension, the overlapping filter rectangle 
segments are decomposed into non-overlapping intervals, 
and the highest priority filter-rule overlapping each non- 
overlapping interval is associated with that interval. A fitter- 
rule table is then constructed with entries ordered according 
to prefix length and non-overlapping interval, each entry 
associated with a particular filter-rule. This filter-rule table is 
constructed within a router prior to processing of received 
packets. Packet classification in accordance with the present 
invention then processes the received packets using the field 
or other parameter information in the packet. The field or 
other parameter information is matched to the filter-rule 
table entries to identify the filter-rule rectangle associated 
with the filter-rule to be applied to the packet. 

In accordance with the present invention, values for each 
s,,, of r m =(s m ,d m ) in the s-dimension are desirably ranges that 
are a power of two. Consequently, prefix values ("prefixes") 
define ranges ("prefix ranges") that are a power of two. The 
length of a prefix is the number of specified bits of the prefix. 
The prefix range is between a lower bound defined by the 
prefix and unspecified bits set to logic "0" and the upper 
bound defined by the prefix and unspecified bits set to logic 
"1". The length may be represented by a binary value. The 
d m may be single points, ranges defined in a manner similar 
to prefix ranges in the s-dimension, and/or ranges defined as 
continuous ranges. When multiple matches of a same length 
prefix occur for a specific value of s m , the query point (S, D) 
is associated with the highest priority filter rule having the 
matching prefix of d m , if an overlap also occurs in the 
d-dimension. 

FIG. 3 illustratively depicts prefixes and prefix ranges of 
a field in a s-dimension where the prefix ranges are a power 
of two. Field values s, which may be source addresses, vary 
from 000 to 111 (binary). An address may be a point (Le., 
010) or within a range (i.e., 010 to 101). For a special case, 
prefix ranges may be a power of 2. For example, if a prefix 
range is defined as Oxx, the prefix, represented as a single 
value 0, specifies the range OOOto 011. For this example, the 
prefix has a length of 1 corresponding to one specified bit. 
Two prefixes of length 1 are possible: 1° and L, 1 . If the 
prefix has two bits, or a length of 2, then four prefixes are 
possible: I 2 °, I^, I 2 2 , and I 2 3 . Prefixes of different length 
define prefix ranges that are different powers of two. The 
prefix ranges do not overlap. 

FIG. 4 illustrates an example of decomposition in the 
d-dimension of a 2-dimensional filter-rule rectangle into 
1-dimensional overlapping segment sets and then into non- 
overlapping intervals. As described previously, values for 
each d m of filter rule r m =(s m ,d m ) in the d-dimension may be 
any contiguous range and are not necessarily restricted to 
prefix ranges only. FIG. 4 shows a horizontal axis 429 for the 
d-dimension representing, for example, parameter values for 
IP destination addresses. The process searches through each 
of the applicable filter rules r lf . . . i 4 to be implemented in 
the router for each dimension, and the process may be 
implemented before processing of arriving packets. Each of 
the filter rules . . . r 4 specifies field ranges such as 
di,... d 4 for the d-dimension applicable to the particular 
parameter of the packet header. 
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Field ranges d Jr . . . d 4 arc projected as overlapping in the s-dimension. Therefore, each of the filter-rule rect- 

horizontal line segments, with each segment specifying a angles in set RP/ may associated with each prefix P/ (j an 

start point "b" and end point "q/' of a range for a particular integer and l^j^np,). 

corresponding filter rule (i an integer greater than 0). For value d / in the d-dimension of the set of filter-rule 

example, dj specifies a first range of source addresses on a 5 rectangles RP /J( P d n (P/'d^, . . . , } is a range in the 

first segment defined by start point y and end point "q " d ^ nsion taat l m ; y ' overl * p other ranges . As defined, the 

for filter rule r a . Segments may overlap, such as those of d, ^ Qf f ^ > ^ ^ Qf ^ t 2 

and d,. Consequently, segments are decomposed into non- • an ^ ^ g } and eachof tne 

overlapping intervals I, 0 an integer grea er than 0\ - ^ t ^ ^ f ks . q ^ ■ ^ formed 

There fore the segment defined by start point b, and end 10 ^ ^ ^ ^ ^ rect les in ^ Rp/if 

point .« qi for filler rule r, has a single associated interval I , A ^ ^ ^ ^ . q ^ 

but the segment defined by start pomt «b 2 and end point s . dimension be de * ned to * ave ^ riorit man 

"q 2 tor Utter ruic r 2 has tnree intervals i lf l 2 , and 1 3 oU]er mt _^ ^ shorter fix 

length since they are 

associated with filter rule r 2 . These three non-overlapping mQre * * ^ 

intervals I, I 2 , and I, arc a result of decomposing the 15 Consequently, if filter-rule rectangles in RP/ and 

overlapped segmente of filter rules r,, r 2 and r, a start or • match a ^ £p _ ( D) baged Qn field yalues ^ ^ 

end points^ It should be understood that for each filter rule, s ^ ensio then me ^c^d ^ RP / is 

a range of source addresses and a range of destination (() ckel £p ^ ^^.^ ^ Rp/ js 

addresses, for example, may be specified. allied to packet EP since rectangles in RP/ are formed with 

As described previously, values m the s-dimension of 20 j fixes ^ those rectangles formed in RP /. 
each rectangle desirably have lengths of a power of 2 when ° , , . , « * , 
the values m the s-dimension are defined as prefix ranges. f or tbc d f™f f^' thc 1 S / Z f ° f the llSt ° f the "* of d/ 
Ranges in dimensions being prefix ranges provide con- ^ ues mav be defined 35 k f mtc f D r S reat ^ *»»*> 
strain* such as illustrated in FIG. 3. When prefix range * om f ach J ran S esm a ^ f ' ^T^T^ (S " 
intervals have lengths which are powers of two, arbitrary ^ a list of non^verlapping interval ID/ is formed along 
overlapping of filter-rules for the dimension does not occur the axis °f. tbe dfnwnsion from filter-rule segments Id/ 
since two prefixes of the same length do not overlap. Also, corresponding to the values of d/ TTie size of this new set 
a prefix range interval starts from an even-value point and of intervals ID/ may be K/§2k/+l. By representing the 
terminates at an odd-value point. Consequently, a set of ori B mal k / overlapping intervals as non-overlapping 
prefix ranges form several distinct cells distinguished by the 30 mtervals, a memory space requirement of the packet filter 
length of the prefix or, equivalent^, the length of the range. mav bc mcreased bv onlv a coastant factor of 2 ' 
Further, values for each d m of filter rule r^=(s m ,dj in the For the d-dimension, if the values for d/ are defined to be 
d-dimension may be any contiguous range, such as illus- P refix ran g es > projected filter-nile segments Id/ 
tratcd in FIG. 4, and are not necessarily restricted to prefix alon S d-dimension axis do not overlap, and so the Id/ 
ranges unless the value for ^ is defined as a prefix range. 35 become list of non-overlapping intervals ID/. 
However, modifying the packet filter in accordance with the For the general case, replacing overlapping intervals by 
present invention to define values for d m as prefix ranges non-overlapping intervals allows a search algorithm to 
may be desirable, such as if destination addresses are locate the field value D from the query point (S, D) on one 
concatenated with layer-4 destination ports or some other of these non-overlapping rectangles during the search pro- 
similar header field. 40 cedure. The search algorithm then retrieves the associated 

In accordance with the present invention, filter-rule table enclosing rectangle of the non-overlapping rectangles rep- 
cells for prefix ranges and associated non-overlapping inter- resenting the filter rule to be applied to the packet, 
vals are defined containing pointers to filter-rules as entries Consequently, when many filter-rule rectangles overlap a 
in the filter-rule table in the following manner. Given each given interval in the d-dimension, the particular filter-rule 
mle r^s^d;), for the field range s ( that is an integer power 45 rectangle associated with the given interval when non- 
of 2, the length is defined as l I( bits and for the field range overlapping intervals are formed is the filter-rule rectangle 
d ( the length is defined as \ ai bits. The maximum values of with the highest priority that overlaps the interval, 
lengths \ s( and \ d[ are defined as l sMAX and laj^x, respec- FIG. 5 illustrates a 2-dimcnsional space for an exemplary 
lively. The set of prefixes having a length of i bits are packet filter in accordance with the first embodiment. FIG. 
denoted as P„ i<0^1 jAfAjr . As described with respect to FIG. 50 5 shows a total of np,»2 prefixes of length i equal to 1 (i.e. 
3, there may be several different prefixes of a given length Oxxx and lxxx). For the set of rectangles RPj with, prefix 
i, i.e. tbe set of prefixes of length 1 (Pj) may have up to two length i equal to 1, the corresponding set of filtcr-mle 
elements, prefixes starting with "0" and prefixes starting rectangles is RP i -{el,.e6}. Also shown is a total of np 2 -l 
with "1". The value np ; denotes the number of elements in prefixes of length i equal to 2 (i.e., Olxx) for the set RP a of 
the set of prefixes of length i (P f ) that are present in the 55 filter-rule rectangles formed with prefixes of length i equal 
lookup tabic. The elements of the set of prefixes of length i to 2. The set RP 2 includes the filter-rule rectangles {e2, e3, 
(P,) may be numbered in ascending order of their values; e4}. These filter-rule rectangles may overlap on the axis of 
consequently, the np,. prefixes of the set P ( . are defined as the the d-dimension. Similarly, set of filter-rule rectangles RP 3 
set {Pi\Pf 2 , . . . ft**}' wi 10 prefix of length i equal to 3 (i.e., Ollx) contains one 

The set of filter-nile rectangles RP^lRP^ RP 2 , . . . , 60 filter-rule rectangle e5. 

RPj JJM>Lr } is defined such that each RP, is a subset of the set For the illustration shown in FIG. 5, the set of intervals 

of n filter rule rectangles RP such that subset RP ( includes all given a prefix length of 2 that are created after this overlap 

filter-rule rectangles formed from s value prefixes having a elimination for each ld 2 J is ID 2 J -{a 0 , a.,, . . . a 0 ). Filter-rule 

length of i bits. Further, each subset RP, may be defined as rectangles e2 and e3 overlap in the d-dimension. Filter-rule 

the union of the sets of filter-rule rectangles RPZ-^P/jd/), 65 rectangle c3 of the set of rectangles RP 2 J is associated with 

(P/jd, 2 ), . . . ,} where each 6 Iter- rule rectangle RP/ has the interval a^ since this filter- rule rectangle may be defined to 

Sth prefix of length i (P/) as a side of the filter-rule rectangle have the higher priority than filter rule rectangle e2. 
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Consequently, only this filter-rule rectangle e3 is associated FIG. 6, first, at step 701 the overlapping intervals Id/ are 

with interval ^ even though another filter-rule rectangle sorted into an ascending sequence based on interval starting 

with lower priority overlaps this range a^. points. Then, at step 702, for all j, if an overlapping interval 

For the exemplary system of FIG. 5, a packet EP with Id/ starts or ends, an assigned, non-overlapping interval ID/ 

header field values (S=0110, D=0101) arrives. First, a 5 is generated for previous interval. For step 604 of FIG. 6, at 

matching prefix of length 1 from S-(0) is found and a search s t e p 703, the assigned, non-overlapping intervals ID/ and 

performed for enclosing rectangles formed with this prefix. corresponding pointer to actions for the highest-priority 

The d-dimension is searched and filter-rule rectangle el filter-rule rectangle overlapping this interval are stored in 

shown in FIG. 4 is a first candidate rule, or is the current memor y. Optionally, at step 704 the newly created interval 

solution. Note that rectangles cl and e6 of FIG. 5 are the 1Q and the prcviouslv slorcd adjaccnt mtcrval are compared, 

only rectangles in the set of rectangles with prefixes of and are d jf ^ tWQ mt tQ ^ &amc 

Length equal to 1. Next, a search for the matching prefix (01) ^ a ncw intcfval , D/ is CTCatcd at mosU wheQ 

is performed over the prefixes of length 2. Rectangle e3 is . . . . ' . , f . u . 

determined to be a better candidate rule since 1) the D value an flapping u^M^uta^Ux, w^k 

of the arriving packet overlaps with the range a2, 2) this n f" sct of mtcrv ^ *?/ 15 ^ "? k £5 K * ^ 

filter-rule rectangle e3 is formed with a longer prefix than 15 of xt of overlapping mtcrvals Id/, 

rule el, and 3) this filter-rule rectangle has higher priority In accordance with the pre-processing algorithm of the 

than other rectangles formed with prefixes of equal or lower packet filter, each filter-rule is associated with a pointer in 

length. Finally, a matching prefix (001) of length 3 is located one or more filter-rule table entries. Each filter-rule pointer 

and a search among rectangles with this prefix is performed, is stored in exactly one address in memory corresponding to 

resulting in the rule of rectangle e5 as the best solution. 20 prefix and prefix length on the s-dimension axis, and one or 

A packet filter of the present invention for a router more addresses corresponding to non-overlapping intervals 

employs an algorithm having two parts. The first part is a on the d-dimension axis. The set of filter-rule rectangles 

pre-processing algorithm that searches through filter rules associated with a prefix is stoned as a list of non-overlapping 

and decomposes the filter rules for each dimension. The first intervals and requires space only proportional to the size of 

part is performed by the router prior to processing of 25 the set. Only O(n) memory space may be utilized to store all 

received packets. A second part is a classification algorithm the rect angles since each rectangle appears only in one set 

that processes the received packets using the field or other and therefore ^ size of lhe mioQ of all xts & 0 ( n \ 

parameter information in accordance with the processed n , , , „, , 

filter rules of the pre-processing algorithm. ,?nce the preprocessing algonthm creates the filter-rule 

An exemplary pre-processing algorithm for a packet filter 30 la r ble > ^ classification algonthm performs a look-up search 

in accordance with the present invention is shown and ° f *e fUter-rule table. FIG. 8 dlustrates an exemplary 

described with respect to FIG. 6 and FIG. 7. The pre- ^w-chart of the classification algonthm of the packet filter, 

processing algorithm performs three operations to decom- ™ e classification algorithm may begin at step 801^ Fust, at 

pose the n filter-rule rectangles. First, the filter-rule rect- 801 ' P refix « 0 [ lcn & h L ' P <=< P < , P/f> are 

angles are separated based on the prefix length in the 35 ide f s ? A • kitiaUy, me value of ima y st ^ fo u m 

s-dimension. Second, for each prefix of length i, all associ- f refi * ^ngth such as i-l. Next, at step 802 the prefix P/ of 

ated filter-rule rectangles are projected onto the correspond- le ^ h 1 wth « s < m&l f h ^ the query point S in the 

ing axis in the d-dimension to obtain first the overlapping ^-dimension * determined. If no match of S wuh s,- in P/ is 

intervals Id/. Third, a set of non-overlapping intervals ID/ tma **t sle J> 802 > ^ algonthm ™ v « to step 805. At 

are created from these the overlapping intervals Id/. The « f te P 805 ' ,h / P refix ; len S th 15 ceremented until the 

non-overlapping intervals may be created by a scan of the kogpst P ref f l ™Z lh * (">• cerement 1 if i< W). 

overlapping intervals from lower to higher coordinates in the Consequently, the c assification algonthm repeats for each 

d dimension prefix length until all prefix lengths have been searched. 

FIG. 6 illustrates a flowchart of an exemplary pre- If a match of S with an s, in P/' is found at step 802, then 

processing algorithm in accordance with the present inven- 45 at ste P 803 ^ stored structure in d-dimension associated 

lion. First, at step 601 the set of prefixes P/ (as defined with p / is marched to find the non-overlapping interval ID/" 

previously) for all i and j, 1 *i^ and l^jl=np„ is m that contains * he query point D in the d-dimension. At step 

stored in memory according to, for example, an efficient trie 804 the 5011111011 1S set as «« P ointer associated with 

representation. Then, at step 602 for each filter-rule having taWe entrv (PAID/* ) (m an integer greater than 0). The 

prefix P/, the corresponding set of filter-rule values d/ in the 50 current soluiion m ^ be " best " solution amon « M P rcfix 

d-dimension are projected as overlapping segments IdL At Ien g ths marched so far if shorter prefix lengths correspond 

step 603, for all P/, (i.e., for all j prefixes of length i, to lowcr P nonl y mles > and the begins at the shortest 

l^Wrand l^jl^np,.), the overlapping segments Id/ P refix ( lowest Parity) and goes to the longest prefix 

are decomposed into a set of non-overlapping intervals ID/. (highest priority). The algonthm then moves to step 805. 

At step 604 a pointer is constructed to identify the highest 55 The number of iterations of the classification algorithm in 

priority filter-rule rectangle overlapping the associated non- the worst case is equal to the largest number of possible 

overlapping interval for all intervals of the set ID/. At step prefix lengths, which is 1^^. Consequently, the total time 

605, the set of non-overlapping intervals ID/ are stored with for searching through all prefix lengths is OQ^^) times the 

associated prefix P/ as table entry in the filter-rule table. time to search a list for a prefix length. In addition, the size 

Each entry of the filter-rule table corresponds to the pointer eo of & Q ^ of ID / for a prefix length may be O(n) since there 

identifying actions to applied to a packet for a corresponding are n filter-rules. Hence, an average 0(log n) time is needed 

filter rule. The list of non-overlapping intervals ID/ may be to search each list for a matching entry. The worst case total 

stored in sorted sequence using either an array or a binary execution time of the exemplary classification algorithm is, 

tree. At step 606, the algorithm returns to step 602 if UI^m**, therefore, 0(l jA14 ^og n). 

or until all prefix lengths P, are processed. 65 However, for large numbers of table entries, worst case 

FIG. 7 is a flowchart illustrating the decomposition of performance may not be sufficient for available processor 

intervals of the steps 603 and 604 of FIG. 6. For slep 603 of speed. For example, if a number of possible prefix lengths 
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\ sMAX is 32 and the number of tabic entries n is 2 18 =256K. 
This exemplary classification algorithm may perform 576 
memory accesses in the worst case, which may be prohibi- 
tively high. An alternative embodiment of the present inven- 
tion employs a trie structure with virtual intervals for storage 
of data in memory to reduce the worst-case time-complexity 
0 C 1 iA«r lo g n) to a time-complexity 0(l JlAMX ). 

A trie structure may be employed for data storage with a 
memory space requirement that may be 0(n). Furthermore, 
the order of search for the sets of filter-rules RPj, RP 2 , . . . 
, may be organized by increasing order of prefix lengths. For 
example, a set of intervals from RPj is searched before 
searching a set of intervals from RP 2 and so on. The search 
proceeds in levels L„ with a search of sets belonging to RPj 
being on the first level L h those in RP 2 being on the second 15 
level L^and so on. The number of non-overlapping intervals 
in all of RP,. is defined as N ; . The root (i.e, bottom-most) 
level R lV has N u non-overlapping intervals, and this level 
may be RPi with Nj non-overlapping intervals. The number 
of overlapping intervals at each level without introducing 20 
virtual intervals may be 0(n). In accordance with the present 
invention, introducing "virtual" intervals decreases search 
time of the classification algorithm in multiple ordered lists. 
If elements of a set of intervals are arranged by employing 
virtual intervals as described below, the worst case execution 25 
time may be O^^^^^log n). 

A search of the list of non-overlapping intervals at level 
for example, yields a result of the point D, where D is in 
an interval ID/. A search of the lists at the next level L lVl is 
performed, instead of searching through the remaining inter- 
vals at level L,. In general, the result of the previous search 
at level L,- may be used for the search at level L,- +l , and the 
search at level L, +1 is performed for only those intervals that 
fall in the range of intervals ID^/ in level L i+J given by the 
interval ID/ at L t . For this case, since each level at level L i+1 
there may be 0(n/ls) intervals which fall within the range 
determined by ID/. Hence, an O(log(n/ls))-0(log n) search 
may be needed at every level. 

Consequently, virtual intervals at levels Li^L lsMAX are 
defined in the following manner. The number of intervals N, 
is defined at level L f . Boundary points that demarcate the N ( 
intervals in the d dimension at level L ( are denoted by y/, 
y/ . . . with a maximum of 2N ; such points. Every other 
point at level L, is replicated at level L,_ 1; and up to 2N ( 
points are so propagated to level L M . Although the present 
embodiment is described using propagation of every other 
point, other embodiments may skip NS points, NS an integer 
greater than 1, or may vary the number of points skipped 
according to granularity of the pointers used. 

The points that were propagated together with the points 
defining original non-overlapping intervals ID/, now define 
intervals at level L,^ as new intervals VD,_/. These inter- 
vals are stored as non-overlapping intervals at level L,^. 
Next, for all the intervals at level L,-,! and their associated 55 
points, every other point is replicated and propagated as 
virtual points to level L,_ 2 - This propagation process is 
repeated until the root level L iV , (i.e., LJ is reached. Note 
that the propagation process is employed to speed up the 
search; at each level, the filter-rule rectangles associated 50 
with each non-overlapping interval are as described in the 
preprocessing algorithm described previously. Virtual inter- 
vals and points that result from propagation are desirably 
ignored for association of filter-rule rectangles with non- 
overlapping intervals. 

The propagation process increases memory space require- 
ments by a constant factor, and so the total memory space 



requirement is still 0(n). A maximum amount of virtual 
intervals created and corresponding maximum memory 
space is when Nj^k^d, n being the number of filter rules, 
in which case the number of boundary points at level 
is 2n. The extra memory space due to the propagations is 
then as given in equation (1) 



(»*£*3 + ...)«2« 



(i) 
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Increasing the memory space by a constant factor, 
however, allows for searching of multiple lists (i.e. lists of 
non-overlapping intervals at each level) efficiently. A packet 
EP=(S, D) arrives at the packet filter and is processed by the 
classification algorithm with a filter-rule table organized in 
accordance with the alternative embodiment. A first level, 
i.e., Lj list of non-overlapping intervals VD/ is searched as 
described previously with respect to the classification 
algorithm, taking 0(log n) time for the worst case. This 
search results in locating the given point D in an interval 
VD/ that may be a virtual interval propagated from the level 
1^. With D localized to this interval ID/, a search in the next 
level I^searches in the range of intervals given by VD/. 
Because every other point has been propagated up from 
level Lj, only 2 intervals in VD/ may fall within the interval 
VD/ to which D has been localized. Hence, the search at 
level Lj may be completed in 0(1) time. In general, in 
moving from level L, to level L,- +1 , the propagation of 
intervals allows enough information gained in the search at 
level L, to be employed in the search at level L ;+1 is 0(1) 
time. Hence, the worst case execution time of the look-up 
algorithm of the alternative embodiment is 0(l jAfAX +log n). 

FIG. 9A and 9B illustrate an example of an alternative 
embodiment of the packet filter employing virtual intervals 
to reduce search time of the classification algorithm. FIG. 
9A illustrates a trie structure employed to search prefix 
values of fourteen exemplary filter rules in ascending order 
of length. FIG. 9B shows creation of virtual intervals for 
levels of a portion of the trie structure shown in FIG. 9B. For 
the exemplary embodiment of FIG. 9 A and FIG. 9B, Table 
1 provides a list of filter-rules with corresponding prefix 
values and lengths for source fields and destination field 
ranges. 

TABLE 1 

Filter-Rule Source Destination range d 

Number Prefix Value Prefix length (lower bound, upper bound) 



50 



2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 



11" 
0* 

00' 
0* 

0* 

10* 
001* 
000* 
000* 
001* 
OOP 
001* 
001* 
001* 



(0,15) 
(4,7) 
(12,15) 
(12,15) 
(8,15) 
(8,15) 
(8,15) 
(6,7) 
(4.5) 
(8,9) 
(*>*) 
(10,11) 
(12,13) 
(0,3) 



65 



A packet EPwith fields S-0010 and D-1101 arrives in the 
system. Referring to FIG. 9A, a search of the trie structure 
900 (the trie search) in the s-dimension begins at the root 
level 901 (level 0) to determine if the source address 
(S-Oxxx) begins with a 0(statc 902) or a 1 (state 903). This 
is a search of the set of prefixes of length 1. The trie search 
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moves to the state 902 at level 1 corresponding to the prefix 
Oxxx of length 1. Similarly, at level 2 the trie search 
determines if the next bit of the source address (S=00xx) is 
a 0 (state 904) or a 1 (state 905). The trie search moves to 
the state 904 at level 2 corresponding to the prefix OOxx of 
length 2. Finally, at level 3 the trie search of a portion of the 
set of prefixes of length 3 determines if the next bit of the 
source address (S=001x) is a 0 (state 908) or a 1 (state 909). 
The trie search moves to the state 909 at level 3 correspond- 
ing to the prefix OOlx of length 3. For searches of prefixes, 
only a portion of sets of prefixes are searched in the tries. 
Consequently, states 903, 906 and 907 are not reached since 
the trie search moves from state 901 to state 902, to state 
904. 

FIG. 9B illustrates an example of virtual intervals and 
point propagation to reduce search time of the classification 
algorithm. First, non-overlapping intervals in the 
d-dimension are shown for selected states at each level. For 
example, at level 1, state 902 corresponds to the prefix of 
length 1 being Oxxx. The filter-rules of this prefix Oxxx 
(from Table 1) are rules 2, 4 and 5 with respective filter-rule 
segments (decimal ranges in the d-dimension) of (4,7), 
(8,12) and (8,15). These filter-rule segments are then decom- 
posed into non-overlapping intervals (4,7), (8,12) and (12, 
15). Without virtual intervals, the trie search at level 1 
searches these three intervals to find the value D-1101 (i.e., 
13 decimal) included in the third non-overlapping interval 
(12,15) associated with rule 5. However, for the next level 
2, the information of this search is lost. 

Referring to FIG. 9B, the non-overlapping intervals of the 
highest level, level 3, are shown for the states 908 and 909. 
Points of these original, non-overlapping intervals at level 3 
are propagated to the non-overlapping intervals at level 2. 
Brackets in FIG. 9B indicate original, non-overlapping 
intervals. For the example shown, alternate points of the 
intervals of the left state 908 (next bit 0) and right state 909 
(next bit 1) are inserted into the non-overlapping intervals of 
the states of the next level 2, but as described previously the 
present invention is not so limited. For example, virtual 
intervals (03), (3,4), (5,6), (6,9), (9,11), (11,12), (12,13) and 
(13,15) are created from the original non-overlapping inter- 
val (12,15). Next, the alternate points of the intervals of state 
904 arc propagated to level 1, and as shown, propagated 
points, such as 12, may be duplicated in a level, since 
pointers are to be associated with the intervals. Normally, 
points of left and right slates are propagated, but for the 
example of FIG. 9A and FIG. 9B, no rules or intervals are 
associated with state 905. 

As the trie search of prefixes as shown in FIG. 9A 
progresses, the search of intervals is as shown in FIG. 9B. 
At level 1, state 902, the intervals in the d-dimension are 
searched and the value of D=1101, 13 decimal, is deter- 
mined to be included in the interval (12, 12,15). At level 2, 
after the prefix search moves to state 904, the pointer 
associated with propagated point 12 in interval (12,12,15) is 
employed to limit the search in level 2 to interval (12,13,15). 
At level 3, after the prefix search moves to state 909, the 
pointer associated with propagated point 13 in interval 
(12,13,15) is employed to limit the search in level 3 to 
interval (12,13), associated with rule 13 of Table 1. 

As described, the algorithm for computing the filters is 
largely implemented in hardware and may be manufactured 
in application specific integrated circuit (ASIC) form, or as 
a field programmable gate array (FPGA) that consequently, 
may operate at very high speed. FIG. 10 illustrates the 
hardware system 1000 for implementation of the packet 
filter in accordance with the present invention in a packet 
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forwarding engine or router, including an input line 1005 for 
receiving an incoming packet and a bi-directional CPU 
interface line 1010 representing control and timing lines for 
purposes of illustration. The incoming packet is input to a 

5 pipeline register 1025 for temporary storage and is also input 
to each classification processor 1050. Classification proces- 
sor 1050 employs memory 1030 to identify a filter-rule to be 
applied to the incoming packet. Field processor 1035 
updates fields of the packet stored in pipeline register 1025 
Q based on the identified filter-rule to be applied to the 
incoming packet. The details of classification processor 
1050 are now described with reference to FIG. 11. 

FIG. 11 shows a classification processor 1050 that 
receives the incoming packet and stores field parameters, 
e.g., source address and destination addresses S and D, in a 

15 register 1176. Under the control of filter processor 1160, 
optional memory control device 1165, and associated 
memory 1030, the search of the classification algorithm is 
performed whereby non-overlapping interval information 
from memory 1030 is provided to the register 1179 for each 

20 prefix length. Comparator 1180 performs a comparison to 
ascertain each interval associated with the D value of the 
received packet. After the correct solution for a filter-rule 
rectangle is found, its corresponding bitmap vector contain- 
ing potential filter-rule actions is provided from register 

25 1179 along line 1190. From the resultant bitmap vector, the 
CPU will apply the rule of highest priority, and performs the 
action dictated by the filter rule upon the received packet 
stored in the pipeline register 1025. Thus, the packet may be 
dropped or forwarded to another destination on output line 

3„ 1015. 

The preprocessing algorithm of the present invention may 
be implemented in the classification processor by filter-rule 
processing and table processing modules. The filter-rule 
processing module may assign filter- rules to prefix values 
and lengths in one dimension, project the filter-rule seg- 
ments in the other dimension, and decompose the filter-rule 
segments into non-overlapping intervals. The table- 
processing module may be employed to coordinate memory 
organization and storage, generating the necessary pointers 
with non-overlapping intervals for particular prefix value 

40 addressing schemes. 

An example memory organization for the system is illus- 
trated in FIG, 12, which depicts a filter-rule table having a 
plurality of interval lists in one dimension corresponding to 
each prefix length of another dimension, which may be 

45 associated with the following respective filter parameters: 1) 
destination addresses, and 2) source address. Entries of the 
filter-rule table are generated as described previously, i.e., 
with respect to FIGS. 6 and 7, and addressed by prefix values 
1259a-1259rf. Each filter-rule table is shown to include an 

50 array 1260^-1260^ of intervals to be searched correspond- 
ing to prefix values as described above with reference to 
FIG. 8, and the corresponding filter actions 1261a-1261d 
and the pointers 1262a-1262 d. 
While embodiments of the present invention are shown 

55 and described with respect to searches in a given dimension 
ordered from shortest to longest length, as would be appar- 
ent to one skilled in the art the present search algorithms 
and/or filter-rule table structures may be varied. For 
example, the search may be from the longest to the shortest 

60 prefix length, or from initial to final prefix values in an 
ordered fist of the set of prefix values. Further, matching of 
packets field values with prefix values and interval values 
are described herein using binary search techniques, but the 
present invention is not so limited. As would be apparent to 

65 one skilled in the art, other search techniques to match 
values may be employed, such as employing a perfect hash 
method. 
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It will be further understood that various changes in the into one or more non-overlapping intervals associ- 

details, materials, and arrangements of the parts which have ated with each prefix value having the same length, 

been described and illustrated in order to explain the nature 4- The invention as recited in claim 3, wherein the first and 

of this invention may be made by those skilled in the art second kngths are either 1) the longest and next longest 

without departing from the principle and scope of the 5 Jf°Bths in a descending prefix length order, , respectively, or 

invention as expressed in the following claims. 2 ) ? c /litest and next shortest lengths in an ascending 

What is claimed is: prefixlcngth order, respectively 

1. Apparatus for associating at least one filter rule with a 5 / ™ e mVentloa aS ret f d m claiID , 3 ' wher ? n the Sec ^ 
1 . t. cn i j .u i . u • j u projecting means projects, as selected points, every Nth 

packet, each filter rule and the packet characterized by ^ oint tha r de fi n es eWr a start point or a stop point of each 

values in first and second dimensions, the filter rule to be io non . overlapping mterval in the previous level, N an integer 

applied to the packet by a router m a communications greater than 1 

network, the apparatus comprising: 6 ^ ^ rccitcd in claim 2 , wherein the values 
a storage medium adapted to store a filter-rule table, each of each filter rule in the second dimension are at least one 
entry of the filler-rule table corresponding to a prefix raD ge being a power of 2, each range being projected as a 
value having a length in the first dimension and at least 15 corresponding filter-rule segment to form the non- 
one interval in the second dimension; and overlapping interval in the second dimension, 
a classification processor comprising: 7. The invention as recited in claim 1, wherein the values 
a comparator adapted to identify each prefix value of each filter rule arc field ranges, the field ranges in the first 
matching the value of the packet in the first dimension being a power of two, and each prefix length 
dimension, and 20 defines a number of specified bits of the field range, 
a filter processor adapted to retrieve, from the filter-rule 8. The invention as recited in claim 1, wherein an entry of 
table, each interval associated with each prefix value the filter-rule table of the storage medium includes a pointer 
identified by the comparator containing the value of identifying at least one filter rule contained in the corre- 
the packet in the second dimension, sponding non-overlapping overlapping interval, 
wherein the filter processor identifies as a solution ^ 9. The invention as recited in claim 8, wherein each 
interval the interval associated with the prefix length filter-rule has an associated priority, and the pointer identi- 
characterized by an associated predetermined metric fies the filter-rule with the highest associated priority con- 
and containing the second field, and tained in the corresponding non-overlapping interval, 
wherein the classification processor associates the filter 10. The invention as recited in claim 8, wherein the values 
rule corresponding to the solution interval with the of each filter rule are field ranges, the field ranges in the first 
packet. dimension being a power of two, and each prefix length 

2. The invention as recited in claim 1, wherein the defines a number of specified bits of the field range, 
classification processor further comprises a pre-processor 11. The method as recited in claim 1, wherein the asso- 
including: ciated predetermined metric is either the prefix value having 

a filter-rule processing module adapted to: the longest prefix length, the shortest prefix length or the 

assign each filter-rule to one or more prefix values 35 prefix length having a highest priority. 

based on the values in the first dimension, 12. A method of associating at least one filter rule with a 

project, for each prefix value having the same length, packet, each filter rule and the packet characterized by 

values of each corresponding filter rule of the prefix values in first and second dimensions, the filter rule to be 

value onto the second dimension to define at least applied to the packet by a router in a communications 

one filter-rule segment and 40 Detwork> the method com p ris i n g u, e steps Q f: 

decompose each filter-rule segment mto one or more a ) providing a filter-rule table, each entry of the filter-rule 

non-overlapping intervals associated with each pre- * ' din to a ^ valuc ^- a k lh m 

fix value of the same length m the second dimension; ^ fifst ^ ^ ofle J^, * ^ 

, ? . , , . second dimension: 

a table-processmg module adapted to generate a pointer 45 fc) identif ^ each efix value matchi me value of the 

for each corresponding non-overhpping interval to ' pa cket in the firs? dimension; 

identify an included niter-rule, the table-processing . 4 . . - , , . 

module adapted to store the pointer as an eStry of the c > retne ™f > ^ fton-vte table, each interval 

filter-rule table associated with a prefix value length associated with each prefix value identified in step b) 

and a non-overlapping interval. so containing the value of the packet m the second dimen- 

3. The invention as recited in claim 2, wherein: , v S1 ° n ' . ... ... 

„ a i a *u d ) identifying, as a solution interval, the interval associ- 

the filter-rule processing module furUier comprises: ated ^ ^ fix y ^ c charact ' eri2ed b „ assod _ 

assigning means for assigning each prefix value of the . . , . " , 4 . , ... {. , c 

. „ „ „„ * a- ated predetermined metric and containing the value of 

same length to a corresponding level: , t ■ , . r 

first projecfing means for projecting, for the level « me packet m the second dimension; and 

having prefix values of a first length, values of each SS c > associating the filter rule corresponding to the solution 

corresponding filter rule onto the second dimension iritervd with me packet. 

to define at least one filter-rule segment; B " ™ e n ? elhod ™ r f cited m claim U > wherem ste P a ) 

second projecting means for projecting, in each level comprises the steps of: 

beginning at the level having prefix values having a 0 assigning each filter-rule to one or more prefix values 

second length, 1) values of each corresponding filter 60 based °n the values in the first dimension; 

rule onto the second dimension to define at least one g) projecting, for each prefix value having the same 

filter-rule segment in a current level, and 2) selected length, values of each corresponding filter rule of the 

points of the at least one non-overlapping interval in prefix value onto the second dimension to define at 

the previous level so as to define at least one virtual least one filter-rule segment; 

interval in the second dimension; and 65 h) decomposing each filter-rule segment into one or more 

interval forming means for forming each filter-rule non-overlapping intervals associated with each prefix 

segment and each virtual mterval of the current level value of the same length in the second dimension; 
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i) generating a pointer for each corresponding non- 
overlapping interval to identify an included filter-rule; 
and 

j) storing the pointer as an entry of the filter-rule table 
associated with a prefix value length and a non- 5 
overlapping interval. 

14. The method as recited in claim 13, wherein: 
step g) further comprises the steps of; 

gl) assigning each prefix value of the same length to a 
corresponding level; 

g2) projecting, for the level having prefix values having 10 
a first length, values of each corresponding filter rule 
onto the second dimension to define at least one 
filter-rule segment, 

g3) projecting, in each level beginning at the level 
having prefix values having a second length, 1) 15 
values of each corresponding filter rule onto the 
second dimension to define at least one filter- rule 
segment in a current level, and 2) selected points of 
the at least one non-overlapping interval in the 
previous level so as to define at least one virtual 
interval in the second dimension; and 20 
step h) further comprises the step of: 

hi) forming each filter- rule segment and each virtual 
interval of the current level into one or more non- 
overlapping intervals associated with each prefix 
value having the same length. 25 

15. The method as recited in claim 14, wherein, for steps 
g2) and g3), the first and second lengths are either 1) the 
longest and next longest lengths in a descending prefix 
length order, respectively, or 2) the shortest and next shortest 
lengths in an ascending-prefix length order, respectively. 

16. The method as recited in claim 14, wherein step g3) 30 
projects, as selected points, every Nth point that defines 
either a start point or a stop point of each corresponding 
non-overlapping interval in the previous level, N an integer 
greater than 1. 

17. The method as recited in claim 13, wherein the values 35 
of each filter rule in the second dimension are at least one 
range being a power of 2, the projecting step g) projects each 
range as a corresponding filter-rule segment in the second 
dimension, and the decomposing step h) forms the non- 
overlapping interval from the corresponding filter-rule seg- 40 
ment projected in step g). 

18. The method as recited in claim 12, wherein the values 
of each filler rule are field ranges, the field ranges in the first 
dimension being a power of two, and each prefix length 
defines a number of specified bits of the field range. 45 

19. The method as recited in claim 12, wherein, for the 
filter-rule table provided in step a), an entry of the filter-rule 
table associated with a prefix value length and a non- 
overlapping interval includes a pointer identifying at least 
one filter rule contained in the corresponding non- SO 
overlapping interval. 

20. The method as recited in claim 19, wherein each 
filter-rule has an associated priority, and the pointer gener- 
ated in step i) identifies the filter-rule with the highest 
associated priority contained in the corresponding non- 55 
overlapping interval. 

21. The method as recited in claim 19, wherein the values 
of each filter rule are field ranges, the field ranges in the first 
dimension being a power of two, and each prefix length 
defines a number of specified bits of the field range. 60 

22. The method as recited in claim 12, wherein for step d) 
the associated predetermined metric is either the prefix value 
having the longest prefix length, the shortest prefix length or 
the prefix length having a highest priority. 

23. A method of storing at least one filter rule with values 65 
associated with first and second dimensions in a filter-rule 
table comprising the steps of: 
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a) assigning each filter-rule to one or more prefix lengths 
based on the values in the first dimension; 

b) projecting, for each prefix length, values of each 
corresponding filter rule of the prefix length onto the 
second dimension to define at least one filter-rule 
segment, 

c) decomposing each filter-rule segment into one or more 
non-overlapping intervals associated with each prefix 
length and corresponding filter rule in the second 
dimension; 

d) generating a pointer for each corresponding non- 
overlapping interval to identify an included filter- rule; 
and 

e) storing the pointer as an entry of the filter-rule table 
associated with a prefix length and a non-overlapping 
interval. 

24. The method as recited in claim 23, wherein: 

step b) further comprises the steps of: 

bl) assigning each prefix value of the same length to a 
corresponding level; 

b2) projecting, for the level having prefix values of a 
first length, values of each corresponding filter rule 
onto the second dimension to define at least one 
filter-rule segment, 

b3) projecting, in each level beginning at the level 
having prefix values having a second length, i) 
values of each corresponding filter rule onto the 
second dimension to define at least one filter-rule 
segment in a current level, and if) selected points of 
the at least one non-overlapping interval in the 
previous level so as to define at least one virtual 
interval in the second dimension; and 
step c) further comprises the step of: 

cl) forming each filter-rule segment and each virtual 
interval of the current level into one or more non- 
overlapping intervals associated with each prefix 
value having the same length. 

25. The method as recited in claim 24, wherein, for steps 
b2) and b3), the first and second lengths are either 1) the 
longest and next longest lengths in a descending prefix 
length order, respectively, or 2) the shortest and next shortest 
lengths in an ascending prefix length order, respectively. 

26. The method as recited in claim 24, wherein step b3) 
projects, as selected points, every Nth point that defines 
either a start point or a stop point of each corresponding 
non-overlapping interval in the previous level. 

27. The method as recited in claim 23, wherein the values 
of each filler rule are field ranges, the field ranges in the first 
dimension being a power of two, and each prefix length 
defines a number of specified bits of the field range. 

28. The method as recited in claim 23, wherein each 
pointer stored in the filter-rule table in step e) identifies each 
filter rule contained in the non-overlapping interval. 

29. The method as recited in claim 23, wherein each 
pointer stored in the filter-rule table in step e) identifies the 
filter-rule with the highest associated priority contained in 
the corresponding non-overlapping interval. 

30. The method as recited in claim 23, wherein the values 
of each filter rule in the second dimension are at least one 
range being a power of 2, the projecting step b) projects each 
range as a corresponding filter- rule segment in the second 
dimension, and the decomposing step c) forms the non- 
overlapping interval from the corresponding filter-rule seg- 
ment projected in step b). 



01/16/2004, EAST Version: 1.4.1 



